• Python技能树练习——统计词频


    目录

    一、题目

    二、已有代码,完善TODO部分 

    三、代码问题+结论

    3.1 get()自己设置返回值

    3.2 values() 与 list() 

    3.3 min()

    3.4 sort()的key=lambda

    3.5 list[0:5] 

    3.6 print(f'

    3.7 append()

    3.8 join()函数是 split() 方法的逆方法


    一、题目

    请编写一段单词统计Python代码,统计下面两个Python三引号字符串里英文单词的词频。要求:

      1. 单词请忽略大小写
      2. 使用数组splits = ['\n', ' ', '-', ':', '/', '*', '_', '(', ')', '"', '”', '“',']','[']来切割单词
      3. 输出词频最高的5个单词和词频信息
    1. '''
    2. * Python 代码风格指南',
    3. * [google-python-styleguide_zh_cn](https://zh-google-styleguide.readthedocs.io/en/latest/google-python-styleguide/python_style_rules /)
    4. * [PEP8](https://legacy.python.org/dev/peps/pep-0008/)
    5. * 代码风格和自动完成工具链
    6. * 基本工具
    7. * [pylint](https://pylint.org/)
    8. * [autopep8](https://pypi.org/project/autopep8/)
    9. * Visual Studio Code Python 开发基本插件
    10. * Pylance
    11. * Python Path
    12. * Python-autopep8
    13. '''

    1. '''
    2. Every major open-source project has its own style guide: a set of conventions (sometimes arbitrary) about how to write code for that project. It is much easier to understand a large codebase when all the code in it is in a consistent style.
    3. “Style” covers a lot of ground, from “use camelCase for variable names” to “never use global variables” to “never use exceptions.” This project (google/styleguide) links to the style guidelines we use for Google code. If you are modifying a project that originated at Google, you may be pointed to this page to see the style guides that apply to that project.
    4. This project holds the C++ Style Guide, C# Style Guide, Swift Style Guide, Objective-C Style Guide, Java Style Guide, Python Style Guide, R Style Guide, Shell Style Guide, HTML/CSS Style Guide, JavaScript Style Guide, TypeScript Style Guide, AngularJS Style Guide, Common Lisp Style Guide, and Vimscript Style Guide. This project also contains cpplint, a tool to assist with style guide compliance, and google-c-style.el, an Emacs settings file for Google style.
    5. '''

    预期结果:

    二、已有代码,完善TODO部分 

    1. def top_words(splits, text, top_n=5):
    2. i = 0
    3. word_dict = {}
    4. chars = []
    5. while i < len(text):
    6. c = text[i]
    7. if c in splits:
    8. # 过滤掉分隔字符串
    9. while i+1 < len(text) and text[i+1] in splits:
    10. i += 1
    11. word = ''.join(chars).lower()
    12. # 统计词频
    13. # TODO(You): 请在此添加代码
    14. word_info = word_dict.get(word)
    15. if word_info is None:
    16. word_info = {'word': word, 'count': 1}
    17. word_dict[word] = word_info
    18. else:
    19. word_info['count'] += 1
    20. chars = []
    21. else:
    22. chars.append(c)
    23. i += 1
    24. word_list = list(word_dict.values())
    25. top_n = min(top_n, len(word_list))
    26. word_list.sort(key=lambda word_info: word_info['count'], reverse=True)
    27. return word_list[0:top_n]
    28. if __name__ == '__main__':
    29. google_style_guide = ...
    30. python_style_guides = ...
    31. splits = [' ', '-', ':', '/', '*', '_', '(', ')', '"', '”', '“']
    32. tops = top_words(splits, google_style_guide+python_style_guides)
    33. print('单词排行榜')
    34. print('--------')
    35. i = 0
    36. while i < len(tops):
    37. top = tops[i]
    38. word = top['word']
    39. count = top['count']
    40. print(f'{i+1}. 单词:{word}, 词频:{count}')
    41. i += 1

    三、代码问题+结论

    3.1 get()自己设置返回值

    1. word_info = word_dict.get(word, {'word': word, 'count': 0})
    2. word_info['count'] += 1
    3. word_dict[word] = word_info

    代码理解:

    word_dict[word]是以word查找下标

    word_info['count']是以count查找下标

    类似于key和value里面的key值

    word_info输出形式是

     如下代码中的赋值看不懂

    word_info = word_dict.get(word[1], {"word": word[1], "count": 0})

    1. get()方法里面如果word_dict中不包含word会怎么执行?word_info的值为空吗?

     如果键不在字典中,想要自己设置返回值

    如果键不在字典中,想要自己设置返回值,可以这样处理,例如dict.get(‘键’,‘never’),键在字典中,则返回键对应的值,键不在字典中,则返回never。

    dict1={'国家':'中国','首都':'北京'}
    print(dict1.get('国家'))
    print(dict1.get('首都','never'))
    print(dict1.get('省会','never'))
    原文链接:https://blog.csdn.net/weixin_42709563/article/details/106593628

    所以

    1. word = 'test'
    2. word_dict = {}
    3. word_info = word_dict.get(word, {"word": word, "count": 0})

    由于word的值(='test')不在word_dict(为空)字典里,所以get(word,{"word":word,"count":0}的含义是如果不在,直接返回word_info: {'word': 'test', 'count': 0}

    依据上诉理解,例题输出如下: 

    1. word = 'test'
    2. word2 = 'python'
    3. word_dict = {}
    4. # 键值对不在word_dict,设置返回值 {"word": word, "count": 0}
    5. word_info = word_dict.get(word, {"word": word, "count": 0})
    6. word_info2 = word_dict.get(word, {"word": word2, "count": 0})
    7. word_dict[word] = word_info
    8. word_dict[word2] = word_info2
    9. print(word_info)
    10. print("-----------")
    11. print(word_info2)
    12. print("-----------")
    13. print(word_dict)
    word_dict[word2] = word_info2

    以上代码的含义为以word2的值为key值

    word_info2的值为value值存在word_dict中

    输出结果如下: 

    3.2 values() 与 list() 

    以下红框中的代码的赋值含义是什么?

    理解如下: 

     list(word_dict.values())

     

    word_dict的值为: {'test': {'word': 'test', 'count': 0}, 'python': {'word': 'python', 'count': 0}}键值对的形式存储

    word_dict.values()为取value值:dict_values([{'word': 'test', 'count': 0}, {'word': 'python', 'count': 0}])

    list(word_dict.values())的值为 :[{'word': 'test', 'count': 0}, {'word': 'python', 'count': 0}]

    1. 格式化为列表元素
    2. 按照有序顺序进行排序的
    3. 每个索引对应一个数据

    word_list = list(word_dict.values())可以根据word_list[0]、word_list[1]来取值

    3.3 min()

    top_n = min(top_n, len(word_list))

    min(n,m)很好理解就是取2者的最小值

    3.4 sort()的key=lambda


    1. sort()函数中的key=lambda什么意思 

    2. 以下2者区别 

    word_list.sort(key=lambda t_word: t_word['count'])
    word_list.sort(key=lambda word_info: word_info['count'], reverse=True)

    有什么区别?
    python 小记 lambda表达式 与 sort函数中的key_python sort key lambda_愿此后再无WA的博客-CSDN博客lambda表达式lambda 表达式常用来声明匿名函数,也就是没有函数名字的、临时使用的小函数,常用在临时需要一个类似于函数的功能但又不想定义函数的场合。例如,内置函数sorted()和列表方法sort()的 key参数,内置函数map()和filter()的第一个参数,等。当然,也可以使用lambda表达式定义具名函数。lambda表达式只可以包含一个表达式,不允许包含复杂语句和结构,但在表达式中可以调用其他函数,该表达式的计算结果相当于函数的返回值。下面的代码演示了不同情况下 lambda表达式的_python sort key lambdahttps://blog.csdn.net/lishuaigell/article/details/124158763

     由上面笔记可得,上面代码word_info : word_info['count']的意思是以每个元素的第二个值count

    ,作为key值进行排序

    不加reverse表示递增排序

    1. word = 'liuYuXin'
    2. word2 = 'python'
    3. word3 = 'popping'
    4. word4 = 'google'
    5. word_dict = {}
    6. # 键值对不在word_dict,设置返回值 {"word": word, "count": 0}
    7. word_info = word_dict.get(word, {"word": word, "count": 2})
    8. word_info2 = word_dict.get(word, {"word": word2, "count": 1})
    9. word_info3 = word_dict.get(word, {"word": word2, "count": 5})
    10. word_info4 = word_dict.get(word, {"word": word2, "count": 7})
    11. word_dict[word] = word_info
    12. word_dict[word2] = word_info2
    13. word_dict[word3] = word_info3
    14. word_dict[word4] = word_info4
    15. print(word_info)
    16. print("-----------")
    17. print(word_info2)
    18. print("-----------")
    19. print(word_dict)
    20. print("-----------")
    21. print(word_dict.values())
    22. print("-----------")
    23. print(list(word_dict.values()))
    24. print("-----------")
    25. top_n = 3
    26. word_list = list(word_dict.values())
    27. word_list.sort(key=lambda t_word: t_word['count'])
    28. print(word_list[0:top_n])

    3.5 list[0:5] 


    word_list[0:top_n]

    - 就是返回list列表里面下标0到top_n的列表

    1. word = 'liuYuXin'
    2. word2 = 'python'
    3. word3 = 'popping'
    4. word4 = 'google'
    5. word_dict = {}
    6. # 键值对不在word_dict,设置返回值 {"word": word, "count": 0}
    7. word_info = word_dict.get(word, {"word": word, "count": 2})
    8. word_info2 = word_dict.get(word, {"word": word2, "count": 1})
    9. word_info3 = word_dict.get(word, {"word": word2, "count": 5})
    10. word_info4 = word_dict.get(word, {"word": word2, "count": 7})
    11. word_dict[word] = word_info
    12. word_dict[word2] = word_info2
    13. word_dict[word3] = word_info3
    14. word_dict[word4] = word_info4
    15. print(word_info)
    16. print("-----------")
    17. print(word_info2)
    18. print("-----------")
    19. print(word_dict)
    20. print("-----------")
    21. print(word_dict.values())
    22. print("-----------")
    23. print(list(word_dict.values()))
    24. print("-----------")
    25. top_n = 3
    26. word_list = list(word_dict.values())
    27. word_list.sort(key=lambda t_word: t_word['count'])
    28. print(word_list[0:top_n])

    3.6 print(f'

    print(f'{i + 1}. 单词:{word}, 词频:{count}')

    其中f是为了{}中的字段

    这里{i+1}、{count}里的count本来类型为int,但是前面加了print(f” ")就能转成字符串,和前面的字符串一起输出

    3.7 append()

    chars.append(c)

    chars = {}

    chars.append()就是添加到后面

    3.8 join()函数是 split() 方法的逆方法

         word = ''.join(chars).lower()

    ‘ ’.join(xxx),按照空格分割字符串xxx元素

    '-',join(xxx) ,按照-分割字符串xxx元素

    1. l = ['8', '3', '2', '7', '1']
    2. s = ' '.join(l)
    3. print(s)

    1. 写法1
    2. >>> symbol="-" # 连接符
    3. >>> seq=["lyx","love","um"] # 字符串序列
    4. >>> symbol.join(seq)
    5. 'lyx-love-um'
    6. 写法2:省略对连接符号的定义,直接用
    7. >>> seq=["lyx","love","um"]
    8. >>> '-'.join(seq)
    9. 'lyx-love-um'

  • 相关阅读:
    计算机毕业设计之java+springcloud基于vue的智慧养老平台-老人信息管理-敬老院管理系统
    串口调试助手和网络调试助手使用总结
    java计算机毕业设计社区志愿者服务管理系统源码+系统+数据库+lw文档+mybatis+运行部署
    系统学习SpringFramework:SpringBean的注入方式
    java集合总结
    如何实现FinClip微信授权登录的三种方案
    正则表达式基础补充学习
    比 nvm 更好用的 node 版本管理工具
    python办公自动化(九)os模块统计文件名、批量重命名、文件压缩
    Android项目打包aar(kts)
  • 原文地址:https://blog.csdn.net/m0_47010003/article/details/133790135