Python 工匠第三章容器类型

基础知识

列表常用操作

list(iterable)可以将任何一个可迭代对象转换成列表

>>> from collections.abc import Iterable
>>> isinstance("ABC", Iterable)
True
>>> list("ABC")
['A', 'B', 'C']
1
2
3
4
5

>>> chars = list("ABC")
>>> del chars[1:]
>>> chars
['A']
1
2
3
4

遍历时换取下标

>>> for i, c in enumerate(chars, start=2):
...     print(i, c)
...
2 A
3 B
4 C
1
2
3
4
5
6

列表推导式

>>> nums = [1, 2, 3, 4]
>>> [n * 10 for n in nums if n % 2 == 0]
[20, 40]
1
2
3

不要写太复杂的列表推导式（不易读）

不要把推导式当作代码量更少的循环

[process(task) for task in tasks]
1

推导式的核心在于可以返回值，上面这种更应该直接循环

理解列表的可变性

mutable：list, dict, set
immutable: int, float, str, bytes, tuple

python 在进行函数调用传参时，传递的是“变量所指对象的引用”(pass-by-object-reference)。

常用元组操作

>>> a = 1, 2, 3
>>> a
(1, 2, 3)
>>> type(a)
<class 'tuple'>
1
2
3
4
5

逗号才是解释器判定元组的依据

函数返回多个结果，就是返回元组

没有元组推导式

>>> nums = (1, 2, 3, 4, 5, 6)
>>> results = (n * 10 for n in nums if n % 2 == 0)
>>> results
<generator object <genexpr> at 0x000001D7ADE90E40>
>>> tuple(results) # generator对象仍是可迭代类型
(20, 40, 60)
1
2
3
4
5
6

存放结构化数据

和列表不同，在一个元组里出现类型不同的值是常见的事情

>>> user_info = ("Tom", 24, True)
1

具名元组 namedtuple

user_info[1] 虽然能取到24，但是不知道这个数字时年龄还是其他的意思

>>> from collections import namedtuple
>>> UserInfo = namedtuple('UserInfo', 'name, age, vip')
>>> user1 = UserInfo('Alice', 21, True)
>>> user1[1]
21
>>> user1.age
21
1
2
3
4
5
6
7

或者使用typing.NamedTyple + 类型注解

>>> from typing import NamedTuple
>>> class UserInfo(NamedTuple):
...     name:str
...     age:int
...     vip:bool
...
>>> user2 = UserInfo('Bob', 12, False)
>>> user2.age
12
1
2
3
4
5
6
7
8
9

字典常用操作

遍历字典

>>> movies = {'name': 'Burning', 'year': 2018}
>>> for key in movies:
...     print(key, movies[key])
...
name Burning
year 2018
>>> for key, value in movies.items():
...     print(key, value)
...
name Burning
year 2018
1
2
3
4
5
6
7
8
9
10
11

访问不存在的字典键

>>> movies["rating"]
Traceback (most recent call last):
  File "", line 1, in <module>
KeyError: 'rating'
1
2
3
4

# 1
>>> if 'rating' in movies:
...     rating = movies['rating']
... else:
...     rating = 0
...
>>> rating
0
# 2
>>> try:
...     rating = movies['rating']
... except KeyError:
...     rating = 0
...
>>> rating
0
# 3
>>> movies.get('rating', 0)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

使用setdefault取值并修改

>>> try:
...     movies['rating'] += 1
... except KeyError:
...     movies['rating'] = 0
...
>>> movies
{'name': 'Burning', 'year': 2018, 'rating': 0}
1
2
3
4
5
6
7

>>> movies = {'name': 'Burning', 'year': 2018}
>>> movies.setdefault('rating', 0) # rating不存在，则设为0
0
>>> movies
{'name': 'Burning', 'year': 2018, 'rating': 0}
>>> movies.setdefault('rating', 1) # rating存在，则返回其值
0
1
2
3
4
5
6
7

使用pop方法删除不存在的键

可以使用del d[key] 删除字典某个键，如果要删除的键不存在，则KeyError
or
d.pop(key, None) 若key存在，返回key对应的value; 反之返回None

当然，pop的主要用途是取出键对应的值

字典推导式

>>> d1 = {'foo': 3, 'bar': 4}
>>> {key: value * 10 for key, value in d1.items() if key == 'foo'}
{'foo': 30}
1
2
3

字典的有序性和无序性

python3.7（3.6）后字典有序
from collections import OrderdDict

相比自带的字典有序，OrderDict有以下特点：

名字就体现其有序性
比较时键的顺序有影响（普通字典比较时忽略顺序）
有.move_to_end()等普通字典没有的方法

集合常用操作

无序可变

不可变集合 frozenset

.add() 添加新成员

>>> f_set = frozenset(['foo', 'bar'])
>>> f_set
frozenset({'bar', 'foo'})
>>> f_set.add('apple')
Traceback (most recent call last):
  File "", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'add'
1
2
3
4
5
6
7

集合运算

交集 &
并集 |
差集 - （前一个集合有，后一个集合没有的东西）
还有 symmetric_difference, issubset等

集合只可以存放可哈希对象

>>> invalid_set = {'foo', [1, 2, 3]}
Traceback (most recent call last):
  File "", line 1, in <module>
TypeError: unhashable type: 'list'
1
2
3
4

可哈希性

不可变的内置类型，如str, int, tuple, frozenset等，都是可哈希的
可变的内置类型，如dict, list等，都是不可哈希的
对于不可变容器类型，如tuple, forzenset等，仅当其所有成员都不可变时，它是可哈希的
用户定义的类型默认都是可哈希的

深拷贝和浅拷贝

>>> nums = [1, 2, 3, 4]
>>> nums_copy = nums # 只是改变指向，没有任何拷贝操作
>>> nums[2] = 30
>>> nums, nums_copy
([1, 2, 30, 4], [1, 2, 30, 4])
1
2
3
4
5

深浅拷贝区别：浅拷贝无法解决嵌套对象被修改的问题！

浅拷贝

1 copy.copy()

>>> import copy
>>> nums = [1, 2, 3, 4]
>>> nums_copy = copy.copy(nums)
>>> nums[2] = 30
>>> nums, nums_copy
([1, 2, 30, 4], [1, 2, 3, 4])
1
2
3
4
5
6

2 推导式

>>> d = {'foo': 1}
>>> d2 = {key: value for key, value in d.items()}
>>> d['foo'] = 2
>>> d, d2
({'foo': 2}, {'foo': 1})
1
2
3
4
5

3 各容器类型的内置函数

>>> d = {'foo': 1}
>>> d2 = dict(d.items())
>>> d, d2
({'foo': 1}, {'foo': 1})

>>> nums = [1, 2, 3, 4]
>>> nums_copy = list(nums)
>>> nums, nums_copy
([1, 2, 3, 4], [1, 2, 3, 4])
1
2
3
4
5
6
7
8
9

4 全切片

>>> nums = [1, 2, 3, 4]
>>> nums_copy = nums[:]
>>> nums, nums_copy
([1, 2, 3, 4], [1, 2, 3, 4])
1
2
3
4

5 某些类型自提供浅拷贝

>>> nums = [1, 2, 3, 4]
>>> nums_copy = nums.copy()
>>> nums, nums_copy
([1, 2, 3, 4], [1, 2, 3, 4])

>>> d = {'foo': 1}
>>> d2 = d.copy()
>>> d, d2
({'foo': 1}, {'foo': 1})
1
2
3
4
5
6
7
8
9

深拷贝

>>> items = [1, ['foo', 'bar'], 2, 3]
>>> items_copy = copy.copy(items)
>>> items[0] = 100
>>> items[1].append('xxx')
>>> items, items_copy
([100, ['foo', 'bar', 'xxx'], 2, 3], [1, ['foo', 'bar', 'xxx'], 2, 3])
1
2
3
4
5
6

>>> id(items[1]), id(items_copy[1])
(2025849749952, 2025849749952)
1
2

可以看到，浅拷贝后item[1]和item_copy[1]对应的仍是同一个对象

items_copy = copy.deepcopy(items)
1

案例故事

编程建议

按需返回替代容器

生成器 generator

>>> def generate_even(max_number):
...     for i in range(0, max_number):
...             if i % 2 == 0:
...                     yield i
...
>>> i = generate_even(10)
>>> next(i)
0
>>> next(i)
2
>>> next(i)
4
1
2
3
4
5
6
7
8
9
10
11
12

用生成器替代列表

def batch_process(items):
	results = []
	for item in items:
		# processed_item = ...
		results.append(processed_item)
	return results
1
2
3
4
5
6

上述代码存在两个问题：

如果items很大，处理会消耗很多时间，results也会占用大量内存
无法做到及时中断

改进：

def batch_process(items):
	for item in items:
		# processed_item = ...
		yield processed_item
1
2
3
4

在某些情况下中断：

for processed_item in batch_process(item):
	# 如果一个对象过期了，剩下的就都不处理了
	if processed_item.has_expired():
		break
1
2
3
4

了解容器的底层实现

避开列表的性能陷阱

在列表尾部插入数据比头部快得多
因为列表底层使用的是C的数组（头部插入数据，数组其他元素都要后移）

如果确实需要经常头部插入数据，可以使用collection.deque
因为deque是双端队列

此外由于列表底层是数组，所以判断一个元素是否在（in）列表内，一般很耗时（集合更好）。

使用集合判断成员是否存在

集合底层使用的是哈希表数据结构

TimeComplexity - Python Wiki

快速合并字典

>>> d1 = {"name": "apple"}
>>> d2 = {"price": 10}
>>> d3 = {**d1, **d2}
>>> d3
{'name': 'apple', 'price': 10}
1
2
3
4
5

>>> d4 = d1 | d2
# python 3.9
>>> d4
{'name': 'apple', 'price': 10}
1
2
3
4

其他解包

>>> [1, 2, *range(10)]
[1, 2, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [*[1,2], *[3,4]]
[1, 2, 3, 4]
1
2
3
4

使用有序字典去重

集合去重会失去原有顺序

>>> from collections import OrderedDict
>>> nums = [10, 2, 3, 21, 10, 3]
>>> list(OrderedDict.fromkeys(nums).keys())
[10, 2, 3, 21]
# fromkeys： 字典的键来自于可迭代对象，字典的值默认为None
1
2
3
4
5

别在遍历列表时同步修改

因为随着列表长度的变化，索引值仍稳定变化，一般会出问题

让函数返回NamedTuple

>>> from typing import NamedTuple
>>> class Address(NamedTuple):
...     country: str
...     city: str
...
>>> def latlon_to_address(lat, lon):
...     # processing
...     return Address(
...             country="country",
...             city="city"
...     )
...
>>> addr = latlon_to_address("lat", "lon")
>>> # addr.country
>>> # addr.city
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

如果我们在Address里新增district, 已有的代码逻辑也不会出问题

相关阅读:
Echarts 社区分享
 【牛客 - 剑指offer】JZ28 对称的二叉树两种方案 Java实现
 【微信小程序】自定义组件（二）
uniapp使用sqlite
【前端面试常问】MVC与MVVM
Android 12 Bluetooth Open[1]
01-工具篇-windows与linux文件共享
 R语言ggplot2可视化：使用ggpubr包的ggerrorplot函数可视化误差线（可视化不同水平均值点以及se标准误差）
SD NAND 的 SDIO在STM32上的应用详解(中篇）
Kotlin--内置函数的总结
原文地址：https://blog.csdn.net/weixin_44596902/article/details/127876935

Python 工匠 第三章 容器类型