纯粹的python优化（数据结构、cache、推导、生成器）

learn from 《Python高性能（第2版）》

1. 数据结构与算法

列表、双端队列

list 底层是数组，在开头插入和删除的时间复杂度 O(n)，在结尾插入和删除是 O(1)，访问任意元素是 O(1)
deque 底层是双向链表实现，开头结尾操作时间复杂度均为 O(1)，代价是访问中间元素是 O(n)
在有序列表里查找元素，可以使用二分查找，bisect 模块，时间O(log n)

字典、反向索引

在N篇文档中查找包含 X 单词的所有文档 [doc for doc in docs if 'X' in doc]
当N非常大的时候这样的效率是很低的

首次可以生成 单词 : [包含该单词的 doc 的 id]，以后每次查询的时间复杂度是 O(1)

集合

底层也是哈希

堆

插入和删除元素的时间复杂度都是 O(log n)

heapq

>>> import heapq
>>> c = [10,1,3,2,5]
>>> heapq.heapify(c)
>>> c
[1, 2, 3, 10, 5]
>>> heapq.heappop(c)  # 弹出最小的值 时间复杂度O(1)
1
>>> heapq.heappush(c, 9)
>>> c
[2, 5, 3, 10, 9]
1
2
3
4
5
6
7
8
9
10

PriorityQueue 优先队列，它是线程、进程安全的

>>> from queue import PriorityQueue
>>> q = PriorityQueue()
>>> for x in [2, 5, 10, 9, 1]:
...     q.put(x)
>>> q
<queue.PriorityQueue object at 0x0000017DD91EED08>
>>> while not q.empty():
...     q.get()  # 取出最小值
...
1
2
5
9
10
1
2
3
4
5
6
7
8
9
10
11
12
13
14

如果是元组等，将按第一个元素排序，一样的话，按第二个，以此类推

>>> q.put([1,2])
>>> q.put([1,3])
>>> q.put([1,-1])
>>> q.get()
[1, -1]
>>> q.get()
[1, 2]
>>> q.get()
[1, 3]
1
2
3
4
5
6
7
8
9

字典树

标准库没有实现，有第三方包实现
字典树可以快速查找前缀字符串，课用于文字补全

pip install patricia-trie
1

另外还有 C语言编写优化过的字典树库 datrie， marisa-trie

https://pypi.org/project/datrie/
https://pypi.org/project/marisa-trie/0.7.1/

from random import choice
from patricia import trie
from string import ascii_lowercase
import cProfile

def random_string(length):
    return ''.join(choice(ascii_lowercase) for i in range(length))

def gen_strings():
    strings = [random_string(32) for _ in range(100000)]
    return strings

strings = gen_strings()
strings_dict = {s: 0 for s in strings}
strings_trie = trie(**strings_dict)

def func1():
    matches = [s for s in strings if s.startswith('abc')]
    print(matches)

def func2():
    matches = list(strings_trie.iter('abc'))
    print(matches)

if __name__ == "__main__":
    cProfile.run('func1()', sort='tottime')
    cProfile.run('func2()', sort='tottime')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

D:\ProgramData\Anaconda3\envs\cv\python.exe D:/gitcode/Python_learning/Python-High-Performance-Second-Edition-master/Chapter02/trie.py 
['abcimostjvrhrhuswjhshhsnoyawtuaq', 'abcymrvuyenkyuppjuwnstzpaxarjpjo', 'abcbltxrvuktanyoiezntijmaryojbvg', 'abckxrzwjmdjaekmiqquezgwjabdhohe', 'abcecoluitahquyqxoidjiwaupatgszd', 'abcppzcrqjxgzuoiuskethybnlgbougt', 'abcbaznirxvyhnvabmbhwmsefdtultpj']
         100006 function calls in 0.036 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100000    0.019    0.000    0.019    0.000 {method 'startswith' of 'str' objects}
        1    0.017    0.017    0.036    0.036 trie.py:18(<listcomp>)
        1    0.000    0.000    0.036    0.036 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        1    0.000    0.000    0.036    0.036 trie.py:17(func1)
        1    0.000    0.000    0.036    0.036 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


['abcimostjvrhrhuswjhshhsnoyawtuaq', 'abcymrvuyenkyuppjuwnstzpaxarjpjo', 'abcbltxrvuktanyoiezntijmaryojbvg', 'abcbaznirxvyhnvabmbhwmsefdtultpj', 'abckxrzwjmdjaekmiqquezgwjabdhohe', 'abcecoluitahquyqxoidjiwaupatgszd', 'abcppzcrqjxgzuoiuskethybnlgbougt']
         83 function calls (66 primitive calls) in 0.000 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
     25/8    0.000    0.000    0.000    0.000 patricia.py:57(_items)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        3    0.000    0.000    0.000    0.000 patricia.py:161(_next)
        1    0.000    0.000    0.000    0.000 trie.py:21(func2)
        1    0.000    0.000    0.000    0.000 patricia.py:354(iter)
        8    0.000    0.000    0.000    0.000 patricia.py:51(_keys)
        7    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
        9    0.000    0.000    0.000    0.000 {method 'values' of 'dict' objects}
        8    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
        1    0.000    0.000    0.000    0.000 patricia.py:366(_accumulate)
        8    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        3    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
        5    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

可以看出 trie 快了至少好几十倍

2. 缓存

lru_cache

例如：计算斐波那契数列，动态规划
python 有 functools.lru_cache装饰器，可指定大小 maxsize

>>> from functools import lru_cache
>>> @lru_cache()
... def sum2(a, b):
...     print('calculate')
...     return a+b
...
>>> sum2(1,2)
calculate
3
>>> sum2(1,2)  # 没有进行计算，直接输出结果了
3
>>> sum2(1,3)
calculate
4
1
2
3
4
5
6
7
8
9
10
11
12
13
14

缓存性能查看

>>> sum2.cache_info()
CacheInfo(hits=1, misses=2, maxsize=128, currsize=2)
1
2

>>> sum2.cache_clear()
>>> sum2.cache_info()
CacheInfo(hits=0, misses=0, maxsize=128, currsize=0)
1
2
3

from functools import lru_cache
import cProfile
@lru_cache(maxsize=None)
def fibonacci_mem(n):
    if n < 1:  return 1
    else:  return fibonacci_mem(n - 1) + fibonacci_mem(n - 2)

def fibonacci(n):
    if n < 1:  return 1
    else:  return fibonacci(n - 1) + fibonacci(n - 2)


if __name__ == '__main__':
    cProfile.run('fibonacci(30)', sort='tottime')
    print(fibonacci_mem)
    cProfile.run('fibonacci_mem(30)', sort='tottime')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

输出：

D:\ProgramData\Anaconda3\envs\cv\python.exe D:/gitcode/Python_learning/Python-High-Performance-Second-Edition-master/Chapter02/cache.py 
         4356620 function calls (4 primitive calls) in 1.695 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
4356617/1    1.695    0.000    1.695    1.695 cache.py:8(fibonacci)
        1    0.000    0.000    1.695    1.695 {built-in method builtins.exec}
        1    0.000    0.000    1.695    1.695 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


<functools._lru_cache_wrapper object at 0x000001882C23E550>
         35 function calls (4 primitive calls) in 0.000 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
     32/1    0.000    0.000    0.000    0.000 cache.py:3(fibonacci_mem)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

joblib

提供了基于磁盘的简单缓存

pip install joblib
1

import os
from joblib import Memory
mem = Memory(cachedir=os.path.join(os.path.dirname(__file__), 'cache_dir'))

@mem.cache
def fibonacci(n):
    if n < 1:  return 1
    else:  return fibonacci(n - 1) + fibonacci(n - 2)

if __name__ == '__main__':
    fibonacci(32)
1
2
3
4
5
6
7
8
9
10
11

在这里插入图片描述

3. 推导和生成器

两者可以替代显式的 for 循环，效率比 for 循环要高

对生成器对象进行迭代时，每次只返回一个计算结果，可以节省内存使用

jupyter 中

%load_ext memory_profiler
import os

numbers = range(1000000)

def map_comprehension(numbers):
    a = [n * 2 for n in numbers]
    b = [n ** 2 for n in a]
    c = [n ** 0.33 for n in b]
    return max(c)

%memit map_comprehension(numbers) 
1
2
3
4
5
6
7
8
9
10
11
12

peak memory: 236.46 MiB, increment: 112.98 MiB
1

def map_normal(numbers):
    a = map(lambda n: n * 2, numbers)
    b = map(lambda n: n ** 2, a)
    c = map(lambda n: n ** 0.33, b)
    return max(c)

%memit map_normal(numbers)
1
2
3
4
5
6
7

peak memory: 123.73 MiB, increment: 0.00 MiB
1

可以看出生成器版本更节省内存

相关阅读:
jar包做成Windows Service 服务，不能访问网络映射磁盘
 .ipynb文件普盲与打开
 【git篇】git的使用
 第151篇 Solidity Example（1）
Modsecurity安装+Nginx+腾讯云CentOS+XSS-Labs靶场+WAF规则
 手机也能搭建个人博客？安卓Termux+Hexo搭建属于你自己的博客网站
 ndk-build
证明：无理数的无理数次方是否还是无理数
 从转载阿里开源项目 Egg.js 技术文档引发的“版权纠纷”，看宽松的 MIT 许可该如何用？...
浅谈C++|STL之vector篇
原文地址：https://blog.csdn.net/qq_21201267/article/details/126260316