目录
请编写一段单词统计Python代码,统计下面两个Python三引号字符串里英文单词的词频。要求:
- 单词请忽略大小写
- 使用数组splits = ['\n', ' ', '-', ':', '/', '*', '_', '(', ')', '"', '”', '“',']','[']来切割单词
- 输出词频最高的5个单词和词频信息
- '''
- * Python 代码风格指南',
- * [google-python-styleguide_zh_cn](https://zh-google-styleguide.readthedocs.io/en/latest/google-python-styleguide/python_style_rules /)
- * [PEP8](https://legacy.python.org/dev/peps/pep-0008/)
- * 代码风格和自动完成工具链
- * 基本工具
- * [pylint](https://pylint.org/)
- * [autopep8](https://pypi.org/project/autopep8/)
- * Visual Studio Code Python 开发基本插件
- * Pylance
- * Python Path
- * Python-autopep8
- '''
- '''
- Every major open-source project has its own style guide: a set of conventions (sometimes arbitrary) about how to write code for that project. It is much easier to understand a large codebase when all the code in it is in a consistent style.
- “Style” covers a lot of ground, from “use camelCase for variable names” to “never use global variables” to “never use exceptions.” This project (google/styleguide) links to the style guidelines we use for Google code. If you are modifying a project that originated at Google, you may be pointed to this page to see the style guides that apply to that project.
- This project holds the C++ Style Guide, C# Style Guide, Swift Style Guide, Objective-C Style Guide, Java Style Guide, Python Style Guide, R Style Guide, Shell Style Guide, HTML/CSS Style Guide, JavaScript Style Guide, TypeScript Style Guide, AngularJS Style Guide, Common Lisp Style Guide, and Vimscript Style Guide. This project also contains cpplint, a tool to assist with style guide compliance, and google-c-style.el, an Emacs settings file for Google style.
- '''
预期结果:
- def top_words(splits, text, top_n=5):
- i = 0
- word_dict = {}
- chars = []
- while i < len(text):
- c = text[i]
- if c in splits:
- # 过滤掉分隔字符串
- while i+1 < len(text) and text[i+1] in splits:
- i += 1
- word = ''.join(chars).lower()
-
- # 统计词频
- # TODO(You): 请在此添加代码
- word_info = word_dict.get(word)
- if word_info is None:
- word_info = {'word': word, 'count': 1}
- word_dict[word] = word_info
- else:
- word_info['count'] += 1
- chars = []
- else:
- chars.append(c)
-
- i += 1
-
- word_list = list(word_dict.values())
- top_n = min(top_n, len(word_list))
- word_list.sort(key=lambda word_info: word_info['count'], reverse=True)
- return word_list[0:top_n]
-
- if __name__ == '__main__':
- google_style_guide = ...
- python_style_guides = ...
- splits = [' ', '-', ':', '/', '*', '_', '(', ')', '"', '”', '“']
-
- tops = top_words(splits, google_style_guide+python_style_guides)
-
- print('单词排行榜')
- print('--------')
- i = 0
- while i < len(tops):
- top = tops[i]
- word = top['word']
- count = top['count']
- print(f'{i+1}. 单词:{word}, 词频:{count}')
- i += 1
- word_info = word_dict.get(word, {'word': word, 'count': 0})
- word_info['count'] += 1
- word_dict[word] = word_info

代码理解:
word_dict[word]是以word查找下标
word_info['count']是以count查找下标
类似于key和value里面的key值
word_info输出形式是
如下代码中的赋值看不懂
word_info = word_dict.get(word[1], {"word": word[1], "count": 0})1. get()方法里面如果word_dict中不包含word会怎么执行?word_info的值为空吗?
如果键不在字典中,想要自己设置返回值
如果键不在字典中,想要自己设置返回值,可以这样处理,例如dict.get(‘键’,‘never’),键在字典中,则返回键对应的值,键不在字典中,则返回never。
dict1={'国家':'中国','首都':'北京'}
print(dict1.get('国家'))
print(dict1.get('首都','never'))
print(dict1.get('省会','never'))
原文链接:https://blog.csdn.net/weixin_42709563/article/details/106593628所以
word = 'test' word_dict = {} word_info = word_dict.get(word, {"word": word, "count": 0})由于word的值(='test')不在word_dict(为空)字典里,所以get(word,{"word":word,"count":0}的含义是如果不在,直接返回word_info: {'word': 'test', 'count': 0}
依据上诉理解,例题输出如下:
word = 'test' word2 = 'python' word_dict = {} # 键值对不在word_dict,设置返回值 {"word": word, "count": 0} word_info = word_dict.get(word, {"word": word, "count": 0}) word_info2 = word_dict.get(word, {"word": word2, "count": 0}) word_dict[word] = word_info word_dict[word2] = word_info2 print(word_info) print("-----------") print(word_info2) print("-----------") print(word_dict)word_dict[word2] = word_info2以上代码的含义为以word2的值为key值
word_info2的值为value值存在word_dict中
输出结果如下:
以下红框中的代码的赋值含义是什么?
理解如下:
list(word_dict.values())
word_dict的值为: {'test': {'word': 'test', 'count': 0}, 'python': {'word': 'python', 'count': 0}}键值对的形式存储
word_dict.values()为取value值:dict_values([{'word': 'test', 'count': 0}, {'word': 'python', 'count': 0}])
list(word_dict.values())的值为 :[{'word': 'test', 'count': 0}, {'word': 'python', 'count': 0}]
- 格式化为列表元素
- 按照有序顺序进行排序的
- 每个索引对应一个数据
word_list = list(word_dict.values())可以根据word_list[0]、word_list[1]来取值
top_n = min(top_n, len(word_list))min(n,m)很好理解就是取2者的最小值
1. sort()函数中的key=lambda什么意思
2. 以下2者区别
word_list.sort(key=lambda t_word: t_word['count'])word_list.sort(key=lambda word_info: word_info['count'], reverse=True)由上面笔记可得,上面代码word_info : word_info['count']的意思是以每个元素的第二个值count
,作为key值进行排序
不加reverse表示递增排序
- word = 'liuYuXin'
- word2 = 'python'
- word3 = 'popping'
- word4 = 'google'
- word_dict = {}
- # 键值对不在word_dict,设置返回值 {"word": word, "count": 0}
- word_info = word_dict.get(word, {"word": word, "count": 2})
- word_info2 = word_dict.get(word, {"word": word2, "count": 1})
- word_info3 = word_dict.get(word, {"word": word2, "count": 5})
- word_info4 = word_dict.get(word, {"word": word2, "count": 7})
- word_dict[word] = word_info
- word_dict[word2] = word_info2
- word_dict[word3] = word_info3
- word_dict[word4] = word_info4
- print(word_info)
- print("-----------")
- print(word_info2)
- print("-----------")
- print(word_dict)
- print("-----------")
- print(word_dict.values())
- print("-----------")
- print(list(word_dict.values()))
- print("-----------")
- top_n = 3
- word_list = list(word_dict.values())
- word_list.sort(key=lambda t_word: t_word['count'])
- print(word_list[0:top_n])
word_list[0:top_n]
- 就是返回list列表里面下标0到top_n的列表
- word = 'liuYuXin'
- word2 = 'python'
- word3 = 'popping'
- word4 = 'google'
- word_dict = {}
- # 键值对不在word_dict,设置返回值 {"word": word, "count": 0}
- word_info = word_dict.get(word, {"word": word, "count": 2})
- word_info2 = word_dict.get(word, {"word": word2, "count": 1})
- word_info3 = word_dict.get(word, {"word": word2, "count": 5})
- word_info4 = word_dict.get(word, {"word": word2, "count": 7})
- word_dict[word] = word_info
- word_dict[word2] = word_info2
- word_dict[word3] = word_info3
- word_dict[word4] = word_info4
- print(word_info)
- print("-----------")
- print(word_info2)
- print("-----------")
- print(word_dict)
- print("-----------")
- print(word_dict.values())
- print("-----------")
- print(list(word_dict.values()))
- print("-----------")
- top_n = 3
- word_list = list(word_dict.values())
- word_list.sort(key=lambda t_word: t_word['count'])
- print(word_list[0:top_n])
print(f'{i + 1}. 单词:{word}, 词频:{count}')其中f是为了{}中的字段
这里{i+1}、{count}里的count本来类型为int,但是前面加了print(f” ")就能转成字符串,和前面的字符串一起输出
chars.append(c)chars = {}
chars.append()就是添加到后面
word = ''.join(chars).lower()‘ ’.join(xxx),按照空格分割字符串xxx元素
'-',join(xxx) ,按照-分割字符串xxx元素
l = ['8', '3', '2', '7', '1'] s = ' '.join(l) print(s)
写法1: >>> symbol="-" # 连接符 >>> seq=["lyx","love","um"] # 字符串序列 >>> symbol.join(seq) 'lyx-love-um' 写法2:省略对连接符号的定义,直接用 >>> seq=["lyx","love","um"] >>> '-'.join(seq) 'lyx-love-um'