高效正则匹配工具

本文仅追求更高效的正则匹配，即改善编译正则表达式的耗时，适合正则匹配场景较多的情况

效率提升精华：本地缓存+减少编译次数（对effective java的思考，以及对数据库连接中TCP耗时的思考，如果条件允许能够收集系统中经常使用的正则表达式，也可以在系统初始化时候进行加载到内存中）+(可选项：收集数据，通过job或者线程提前加载到正则工具中)

maven:version部分自己从maven参考下载


    com.github.ben-manes.caffeine
    caffeine

import com.github.benmanes.caffeine.cache.Cache;
import com.github.benmanes.caffeine.cache.Caffeine;
import org.apache.commons.lang3.StringUtils;
import org.springframework.util.StopWatch;

import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
import java.util.regex.Pattern;

/**
* description: 正则表达式工具类
*/
public class RegUtils {
private RegUtils() {
}

public static final Cache REG_PATTERNS = Caffeine.newBuilder().maximumSize(512).build();

/**
* 获取pattern
* @param expression
* @return
*/
public static Pattern getRegPattern(String expression) {
if (StringUtils.isEmpty(expression)) {
throw new IllegalArgumentException("expression is empty");
}
Pattern ifPresent = REG_PATTERNS.getIfPresent(expression);
if (Objects.nonNull(ifPresent)) {
return ifPresent;
}
Pattern compile = null;
try {
compile = Pattern.compile(expression);
} catch (Exception e) {
throw new IllegalArgumentException("expression error:" + expression + " " + e.getMessage());
}
if (Objects.nonNull(compile)) {
REG_PATTERNS.put(expression, compile);
}
return compile;
}

/**
* 正则替换替代 String类的replaceAll方法
* @param value
* @param reg
* @param replacement
* @return
*/
public static String replaceReg(String value, String reg, String replacement) {
return getRegPattern(reg).matcher(value).replaceAll(replacement);
}

public static void main(String[] args) {

String expression = "[0-9]{3}[-]{1}[0-9]{4}";

String data = "333-8889";

StopWatch stopWatch = new StopWatch(expression);
stopWatch.start();
boolean matches = getRegPattern(expression).matcher(data).matches();
stopWatch.stop();
System.out.println(stopWatch.prettyPrint());
List strings = new ArrayList<>();
strings.add("333-8889");// true
for (String string : strings) {
StopWatch stopWatch1 = new StopWatch(expression);
stopWatch1.start();
boolean matches1 = getRegPattern(expression).matcher(string).matches();
stopWatch1.stop();
System.out.println(matches1);
System.out.println(stopWatch1.prettyPrint());
}
}
}

对比结果

第一次编译之后放入缓存，第二次则直接从缓存中获取，效率和速度上有个质的变化

相关阅读:
ggplot2绘制双坐标轴图
jeecg-boot中上传图片到华为云obs云存储中
用于清理数据的五个简单有效 Python 脚本
【毕业设计】基于单片机的手势识别系统 - 手势识别单片机物联网
Briefings in Bioinformatics2021 | Bert-Protein+：基于Bert的抗菌肽识别
策略设计模式
总结：数组常用方法
C#企业办公自动化系统asp.net+sqlserver
liunx下获取指定python脚本进程正在运行的线程数量
肖sir__linux讲解vim命令（3.1）

原文地址：https://blog.csdn.net/haohaounique/article/details/127955546