Python模糊匹配（fuzzywuzzy package）

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
1
2

1.关键方法说明

ratio 要字符完全一致，匹配精度才较高
partial_ratio 要字符部分一致，匹配精度较高
token_sort_ratio 即使字符顺序不一致，也能较好匹配
token_set_ratio 即使字符顺序不一致且字符有重复，也能较好匹配

>>> fuzz.ratio("西藏 自治区", "自治区 西藏")
50
>>> fuzz.partial_ratio("西藏 自治区", "自治区 西藏")
50
>>> fuzz.ratio('I love YOU','YOU LOVE I')
30
>>> fuzz.partial_ratio('I love YOU','YOU LOVE I')
30
>>> fuzz.token_sort_ratio("西藏 自治区", "自治区 西藏") 
100
>>> fuzz.token_sort_ratio('I love YOU','YOU LOVE I')
100
>>> fuzz.ratio("西藏 西藏 自治区", "自治区 西藏")
40
>>> fuzz.token_sort_ratio("西藏 西藏 自治区", "自治区 西藏")
80
>>> fuzz.token_set_ratio("西藏 西藏 自治区", "自治区 西藏")
100
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

2. process method

process.extract或process.extractOne方法，可以在针对一个字符串，在一个list字符串中找出相似的。不同的是process.extract可以通过limit设置返回的匹配数量，extractOne则仅能返回一个。上述关键方法，可以通过scorer参数来设置。

>>> choices = ["河南省", "郑州市", "湖北省", "武汉市"]
>>> process.extract("州", choices, limit=2)
[('郑州市', 90), ('河南省', 0)]
>>> process.extractOne("州", choices)
('郑州市', 90)
1
2
3
4
5

>>> choices = ["河南省", "郑州市", "湖北省", "武汉市"]
>>> process.extract("州郑 ", choices, limit=2)
[('郑州市', 45), ('河南省', 0)]
>>> process.extractOne("州郑 ", choices)
('郑州市', 45)
>>> process.extract("州郑 ", choices, limit=2, scorer=fuzz.token_set_ratio)
[('郑州市', 40), ('河南省', 0)]
>>> process.extractOne("州郑 ", choices, scorer=fuzz.token_set_ratio)
('郑州市', 40)
1
2
3
4
5
6
7
8
9

3. 参考信息

Fuzzywuzzy use the Levenshtein distance to measure the difference between strings.In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other^[1].
Python中实现模糊匹配的魔法库：FuzzyWuzzy

相关阅读:
【分类器 Softmax-Classifier softmax数学原理与源码详解深度学习 Pytorch笔记 B站刘二大人（8/10）】
【Java面试】3年经验，这个问题该怎么回答 Mybatis是如何进行分页的？
SAP MTS/ATO/MTO/ETO专题之九：M+M模式前后台操作，策略用50，提前准备原材料和半成品
Zepoch 销量即将突破800个，生态发展信心十足
新型基础设施：信息技术设施、融合基础设施和创新基础设施
【面试：并发篇27：多线程：犹豫模式】
踩坑npm install qrcodejs2和crypto-js
网络安全系列-三十七: 针对PCAP数据包进行探索性分析示例
分享金媒v10.3开源系统中CRM线下客户管理系统使用指南和小程序上架细分流程
天呐，我居然可以隔空作画了

原文地址：https://blog.csdn.net/zxxr123/article/details/132954042