Python技能树练习——统计词频

一、题目

二、已有代码，完善TODO部分

三、代码问题+结论

3.1 get()自己设置返回值

3.2 values() 与 list()

3.3 min()

3.4 sort()的key=lambda

3.5 list[0:5]

3.6 print(f'

3.7 append()

3.8 join()函数是 split() 方法的逆方法

一、题目

请编写一段单词统计Python代码，统计下面两个Python三引号字符串里英文单词的词频。要求：


单词请忽略大小写
使用数组splits = ['\n', ' ', '-', ':', '/', '*', '_', '(', ')', '"', '”', '“',']','[']来切割单词
输出词频最高的5个单词和词频信息


'''
* Python 代码风格指南',
    * [google-python-styleguide_zh_cn](https://zh-google-styleguide.readthedocs.io/en/latest/google-python-styleguide/python_style_rules /)
    * [PEP8](https://legacy.python.org/dev/peps/pep-0008/)
* 代码风格和自动完成工具链
    * 基本工具
        * [pylint](https://pylint.org/)
        * [autopep8](https://pypi.org/project/autopep8/)
    * Visual Studio Code Python 开发基本插件
        * Pylance
        * Python Path
        * Python-autopep8
'''


 '''
Every major open-source project has its own style guide: a set of conventions (sometimes arbitrary) about how to write code for that project. It is much easier to understand a large codebase when all the code in it is in a consistent style.
“Style” covers a lot of ground, from “use camelCase for variable names” to “never use global variables” to “never use exceptions.” This project (google/styleguide) links to the style guidelines we use for Google code. If you are modifying a project that originated at Google, you may be pointed to this page to see the style guides that apply to that project.
This project holds the C++ Style Guide, C# Style Guide, Swift Style Guide, Objective-C Style Guide, Java Style Guide, Python Style Guide, R Style Guide, Shell Style Guide, HTML/CSS Style Guide, JavaScript Style Guide, TypeScript Style Guide, AngularJS Style Guide, Common Lisp Style Guide, and Vimscript Style Guide. This project also contains cpplint, a tool to assist with style guide compliance, and google-c-style.el, an Emacs settings file for Google style.
'''

预期结果：

二、已有代码，完善TODO部分


def top_words(splits, text, top_n=5):
    i = 0
    word_dict = {}
    chars = []
    while i < len(text):
        c = text[i]
        if c in splits:
            # 过滤掉分隔字符串
            while i+1 < len(text) and text[i+1] in splits:
                i += 1
            word = ''.join(chars).lower()
 
            # 统计词频
            # TODO(You): 请在此添加代码
            word_info = word_dict.get(word)
            if word_info is None:
                word_info = {'word': word, 'count': 1}
                word_dict[word] = word_info
            else:
                word_info['count'] += 1
            chars = []
        else:
            chars.append(c)
                                     
        i += 1
 
    word_list = list(word_dict.values())
    top_n = min(top_n, len(word_list))
    word_list.sort(key=lambda word_info: word_info['count'], reverse=True)
    return word_list[0:top_n]
 
if __name__ == '__main__':
    google_style_guide = ...
    python_style_guides = ...
    splits = [' ', '-', ':', '/', '*', '_', '(', ')', '"', '”', '“']
 
    tops = top_words(splits, google_style_guide+python_style_guides)
 
    print('单词排行榜')
    print('--------')
    i = 0
    while i < len(tops):
        top = tops[i]
        word = top['word']
        count = top['count']
        print(f'{i+1}. 单词：{word}, 词频：{count}')
        i += 1

三、代码问题+结论

3.1 get()自己设置返回值


word_info = word_dict.get(word, {'word': word, 'count': 0})
word_info['count'] += 1
word_dict[word] = word_info

代码理解：

word_dict[word]是以word查找下标

word_info['count']是以count查找下标

类似于key和value里面的key值

word_info输出形式是

如下代码中的赋值看不懂
word_info = word_dict.get(word[1], {"word": word[1], "count": 0})
1. get()方法里面如果word_dict中不包含word会怎么执行？word_info的值为空吗？

如果键不在字典中，想要自己设置返回值

如果键不在字典中，想要自己设置返回值，可以这样处理，例如dict.get(‘键’,‘never’)，键在字典中，则返回键对应的值，键不在字典中，则返回never。

dict1={'国家':'中国','首都':'北京'}
print(dict1.get('国家'))
print(dict1.get('首都','never'))
print(dict1.get('省会','never'))
原文链接：https://blog.csdn.net/weixin_42709563/article/details/106593628

所以
word = 'test'
word_dict = {}
word_info = word_dict.get(word, {"word": word, "count": 0})
由于word的值（='test'）不在word_dict（为空）字典里，所以get(word,{"word":word,"count":0}的含义是如果不在，直接返回word_info: {'word': 'test', 'count': 0}

依据上诉理解，例题输出如下：


word = 'test'
word2 = 'python'
word_dict = {}
# 键值对不在word_dict,设置返回值 {"word": word, "count": 0}
word_info = word_dict.get(word, {"word": word, "count": 0})
word_info2 = word_dict.get(word, {"word": word2, "count": 0})
word_dict[word] = word_info
word_dict[word2] = word_info2
print(word_info)
print("-----------")
print(word_info2)
print("-----------")
print(word_dict)

word_dict[word2] = word_info2

以上代码的含义为以word2的值为key值

word_info2的值为value值存在word_dict中

输出结果如下：

3.2 values() 与 list()

以下红框中的代码的赋值含义是什么？

理解如下：
 list(word_dict.values())
word_dict的值为： {'test': {'word': 'test', 'count': 0}, 'python': {'word': 'python', 'count': 0}}键值对的形式存储

word_dict.values()为取value值：dict_values([{'word': 'test', 'count': 0}, {'word': 'python', 'count': 0}])

list(word_dict.values())的值为：[{'word': 'test', 'count': 0}, {'word': 'python', 'count': 0}]

格式化为列表元素
按照有序顺序进行排序的
每个索引对应一个数据

word_list = list(word_dict.values())可以根据word_list[0]、word_list[1]来取值

3.3 min()

top_n = min(top_n, len(word_list))
min(n,m）很好理解就是取2者的最小值

3.4 sort()的key=lambda

1. sort()函数中的key=lambda什么意思

2. 以下2者区别
word_list.sort(key=lambda t_word: t_word['count'])
word_list.sort(key=lambda word_info: word_info['count'], reverse=True)
有什么区别？
python 小记 lambda表达式与 sort函数中的key_python sort key lambda_愿此后再无WA的博客-CSDN博客lambda表达式lambda 表达式常用来声明匿名函数，也就是没有函数名字的、临时使用的小函数，常用在临时需要一个类似于函数的功能但又不想定义函数的场合。例如，内置函数sorted（）和列表方法sort（）的 key参数，内置函数map（）和filter（）的第一个参数，等。当然，也可以使用lambda表达式定义具名函数。lambda表达式只可以包含一个表达式，不允许包含复杂语句和结构，但在表达式中可以调用其他函数，该表达式的计算结果相当于函数的返回值。下面的代码演示了不同情况下 lambda表达式的_python sort key lambdahttps://blog.csdn.net/lishuaigell/article/details/124158763

由上面笔记可得，上面代码word_info : word_info['count']的意思是以每个元素的第二个值count

,作为key值进行排序

不加reverse表示递增排序


word = 'liuYuXin'
word2 = 'python'
word3 = 'popping'
word4 = 'google'
word_dict = {}
# 键值对不在word_dict,设置返回值 {"word": word, "count": 0}
word_info = word_dict.get(word, {"word": word, "count": 2})
word_info2 = word_dict.get(word, {"word": word2, "count": 1})
word_info3 = word_dict.get(word, {"word": word2, "count": 5})
word_info4 = word_dict.get(word, {"word": word2, "count": 7})
word_dict[word] = word_info
word_dict[word2] = word_info2
word_dict[word3] = word_info3
word_dict[word4] = word_info4
print(word_info)
print("-----------")
print(word_info2)
print("-----------")
print(word_dict)
print("-----------")
print(word_dict.values())
print("-----------")
print(list(word_dict.values()))
print("-----------")
top_n = 3
word_list = list(word_dict.values())
word_list.sort(key=lambda t_word: t_word['count'])
print(word_list[0:top_n])

3.5 list[0:5]

word_list[0:top_n]

- 就是返回list列表里面下标0到top_n的列表


word = 'liuYuXin'
word2 = 'python'
word3 = 'popping'
word4 = 'google'
word_dict = {}
# 键值对不在word_dict,设置返回值 {"word": word, "count": 0}
word_info = word_dict.get(word, {"word": word, "count": 2})
word_info2 = word_dict.get(word, {"word": word2, "count": 1})
word_info3 = word_dict.get(word, {"word": word2, "count": 5})
word_info4 = word_dict.get(word, {"word": word2, "count": 7})
word_dict[word] = word_info
word_dict[word2] = word_info2
word_dict[word3] = word_info3
word_dict[word4] = word_info4
print(word_info)
print("-----------")
print(word_info2)
print("-----------")
print(word_dict)
print("-----------")
print(word_dict.values())
print("-----------")
print(list(word_dict.values()))
print("-----------")
top_n = 3
word_list = list(word_dict.values())
word_list.sort(key=lambda t_word: t_word['count'])
print(word_list[0:top_n])

3.6 print(f'

print(f'{i + 1}. 单词：{word}, 词频：{count}')
其中f是为了{}中的字段

这里{i+1}、{count}里的count本来类型为int,但是前面加了print(f” ")就能转成字符串，和前面的字符串一起输出

3.7 append()

chars.append(c)
chars = {}

chars.append()就是添加到后面

3.8 join()函数是 split() 方法的逆方法

     word = ''.join(chars).lower()

‘ ’.join(xxx)，按照空格分割字符串xxx元素

'-',join(xxx) ,按照-分割字符串xxx元素


l = ['8',   '3', '2', '7', '1']
s = ' '.join(l)
print(s)


写法1：
>>> symbol="-"    # 连接符
>>> seq=["lyx","love","um"]   # 字符串序列
>>> symbol.join(seq)
'lyx-love-um'
写法2：省略对连接符号的定义，直接用
>>> seq=["lyx","love","um"]
>>> '-'.join(seq)    
'lyx-love-um'

相关阅读:
Springboot监控
 手持式水质监测仪在污水处理中的应用
 人工智能|机器学习——k-近邻算法(KNN分类算法)
BroadcastChannel方法跨浏览器窗口通信
 工业互联网系列2 - 赋能传统制造业
 多旋翼无人机仿真 rotors_simulator：基于PID控制器的位置控制
 林浩然与杨凌芸的Java List大冒险
 Element-ui配合vue上传图片
 Mybatis-Plus批量插入应该怎么用
 wx.request请求eggjs报invalid csrf token
原文地址：https://blog.csdn.net/m0_47010003/article/details/133790135