循环遍历出一个可迭代对象中的元素;
如果字典没有该元素,那么就让该元素作为字典的键,并将该值赋值为1;
如果键中存在这个元素,则将对应的值加1;
比如:
- >>> lists = ['a','a','b',5,6,7,5]
- >>> count_dict = dict()
- >>> for item in lists:
- ... if item in count_dict:
- ... count_dict[item] += 1
- ... else:
- ... count_dict[item] = 1
- ...
- >>> count_dict
- {'a': 2, 'b': 1, 5: 2, 6: 1, 7: 1}
- >>>
defaultdict(parameter)可以接受一个类型参数,如str,int等;
但传递进来的参数不是用来约束值的类型,更不是约束键的类型,而是当键不存在时,实现值的初始化;
- >>> from collections import defaultdict
- >>> lists = ['a', 'a', 'b', 5, 6, 7, 5]
- >>> count_dict = defaultdict(int)
- >>> for item in lists:
- ... count_dict[item] += 1
- ...
- >>> count_dict
- defaultdict(<class 'int'>, {'a': 2, 'b': 1, 5: 2, 6: 1, 7: 1})
与(1)方法结果一致;
先使用set去重;
然后循环的把每一个元素对应的次数lists.count(item)组成一个元素放在列表里面。
- >>> lists = ['a', 'a', 'b', 5, 6, 7, 5]
- >>> count_set = set(lists)
- >>> count_list = list()
- >>> for item in count_set:
- ... count_list.append((item,lists.count(item)))
- ...
- >>> count_list
- [('b', 1), (5, 2), (6, 1), (7, 1), ('a', 2)]
Counter 是一个容器对象,主要的作用是用来统计散列对象,使用三种方式来初始化;
(1)参数里面放:可迭代对象:Counter("success");
(2)传入关键字参数Counter((s = 3,c = 2,e = 1,u = 1)) ;
(3)传入字典 Counter({"s":3,"c":2,"e":1,"u":1}) ;
- >>> from collections import Counter
- >>> lists = ['a', 'a', 'b', 5, 6, 7, 5]
- >>> a = Counter(lists)
- >>> a
- Counter({'a': 2, 5: 2, 'b': 1, 6: 1, 7: 1})
- >>> a.elements()
object at 0x7fbffc3a8310> - >>> a.most_common(2) #前两个出现频率最高的元素已经他们的次数,返回的是列表里面嵌套元组
- [('a', 2), (5, 2)]
- >>> b = Counter("success")
- >>> b
- Counter({'s': 3, 'c': 2, 'u': 1, 'e': 1})
参考:python中统计计数的几种方法和Counter的介绍_dgteu28864的博客-CSDN博客
自己做记录使用,希望大家去看原作者!
使用的为1.3中的方法
- >>> df_data = pd.concat([df_train,df_dev,df_test],ignore_index= True)
- Python 3.8.1 (default, Jan 8 2020, 16:15:59)
- Type 'copyright', 'credits' or 'license' for more information
- IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
- PyDev console: using IPython 7.19.0
- >>> lists = list(df_data['label'])
- >>> count_set = set(lists)
- >>> count_list = list()
- >>> for item in count_set:
- ... count_list.append((item, lists.count(item)))
-
- >>> count_list
- Out[6]: [(0, 2046), (1, 1743)]