这个神器，让我的 Python 代码运行速度快了100倍

Python 已经得到了全球程序员的喜爱，连续多期稳坐编程语言排行榜第一把交椅。但是还是遭到一些人的诟病，原因之一就是认为它运行缓慢。要是有一款能够自动优化我们代码的神器该有多好啊！

于是，大家都在想尽各种办法来提高 Python 代码的运行速度，大多数体现在写代码的习惯优化以及代码优化上。但是平时写代码注意太多这些方面可能会有糟糕的体验，甚至会不利于我们的工作效率。

今天就给大家带来这样的一款神器——taichi，喜欢记得收藏、点赞。

文章目录

taichi

Taichi 起步于 MIT 的计算机科学与人工智能实验室（CSAIL），设计初衷是便利计算机图形学研究人员的日常工作，帮助他们快速实现适用于 GPU 的视觉计算和物理模拟算法。

说人话就是 Taichi 是一个基于 Python 的领域特定语言，专为高性能能并行计算设计。

本来是服务于学术界的一款 DSL ，但是我们也可以拿来用在我们这些凡夫俗子的代码中（虽然有点大材小用）！

技术提升

文章源码、数据、技术提升都轻松获取，本文来自粉丝群小伙伴授权分享，可以加入我们，目前开通了技术交流群，群友已超过2000人，添加时最好的备注方式为：来源+兴趣方向，方便找到志同道合的朋友。

方式、添加微信号：dkl88191，备注：来自CSDN +python
方式、微信搜索公众号：Python学习与数据挖掘，后台回复：加群

安装

Taichi 是一个 PyPI 包，所以使用 pip 命令即可安装：

 pip install taichi
1

注意 taichi 安装的先决条件是：

Python: 3.7/3.8/3.9/3.10 (64-bit)
OS: Windows, OS X, and Linux (64-bit)

在使用命令安装的时候，如果遇到错误，可以使用管理员模式命令行进行安装。

一个小例子

我们先来用一个小栗子，感受一下它的鬼斧神工！

import time
 
def is_prime(n):
    result = True
    for k in range(2, int(n**0.5) + 1):
        if n % k == 0:
            result = False
            break
    return result

def count_primes(n: int) -> int:
    count = 0
    for k in range(2, n):
        if is_prime(k):
            count += 1
    
    return count

t0 = time.time()
print(count_primes(100000))
t1 = time.time()

print(t1-t0)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

这个是我们以前经常用来做例子的统计质数个数。求100000以内速度比较快，但是到了1000000，运行时间就明显慢了下来，竟然需要3.38秒。

Python 的大型 for 循环或嵌套 for 循环总是导致运行时性能不佳。

我们只需导入 Taichi 或切换到 Taichi 的 GPU 后端，就能看到整体性能的大幅提升：

import time
import taichi as ti

ti.init()

@ti.func
def is_prime(n):
    result = True
    for k in range(2, int(n**0.5) + 1):
        if n % k == 0:
            result = False
            break
    return result

@ti.kernel
def count_primes(n: int) -> int:
    count = 0
    for k in range(2, n):
        if is_prime(k):
            count += 1
    
    return count

t0 = time.time()
print(count_primes(1000000))
t1 = time.time()

print(t1-t0)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

在这里，我们只需要引入 taichi 库，然后加两个注解，速度直接飙到了0.1秒，速度提升了30多倍。如果我们把数字再扩大，速度提升会更明显！

没有使用之前，统计10000000以内质数使用 90 秒，使用之后，并且更改为 GPU 运行，使用 0.1秒。

我们还可以将 Taichi 的后端从 CPU 更改为 GPU 运行：

ti.init(arch=ti.gpu)
1

用 Taichi 进行物理模拟

上面的动图很好地模拟了一块布料落到一个球体上。动图中的布料建模使用了弹簧质点系统，其中包含 10,000 多个质点和大约 100,000个弹簧。模拟如此大规模的物理系统并实时渲染绝不是一项容易的任务。

Taichi 让物理模拟程序变得更易读和直观，同时仍然达到与 C++ 或 CUDA 相当的性能。只需拥有基本 Python 编程技能，就可以使用 Taichi 用更少的代码编写高性能并行程序，从而关注较高层次的算法本身，把诸如性能优化的任务交由 Taichi 处理。

我们直接上源代码：

import taichi as ti
ti.init(arch=ti.vulkan)  # Alternatively, ti.init(arch=ti.cpu)

n = 128
quad_size = 1.0 / n
dt = 4e-2 / n
substeps = int(1 / 60 // dt)

gravity = ti.Vector([0, -9.8, 0])
spring_Y = 3e4
dashpot_damping = 1e4
drag_damping = 1

ball_radius = 0.3
ball_center = ti.Vector.field(3, dtype=float, shape=(1, ))
ball_center[0] = [0, 0, 0]

x = ti.Vector.field(3, dtype=float, shape=(n, n))
v = ti.Vector.field(3, dtype=float, shape=(n, n))

num_triangles = (n - 1) * (n - 1) * 2
indices = ti.field(int, shape=num_triangles * 3)
vertices = ti.Vector.field(3, dtype=float, shape=n * n)
colors = ti.Vector.field(3, dtype=float, shape=n * n)

bending_springs = False

@ti.kernel
def initialize_mass_points():
    random_offset = ti.Vector([ti.random() - 0.5, ti.random() - 0.5]) * 0.1

    for i, j in x:
        x[i, j] = [
            i * quad_size - 0.5 + random_offset[0], 0.6,
            j * quad_size - 0.5 + random_offset[1]
        ]
        v[i, j] = [0, 0, 0]


@ti.kernel
def initialize_mesh_indices():
    for i, j in ti.ndrange(n - 1, n - 1):
        quad_id = (i * (n - 1)) + j
        # 1st triangle of the square
        indices[quad_id * 6 + 0] = i * n + j
        indices[quad_id * 6 + 1] = (i + 1) * n + j
        indices[quad_id * 6 + 2] = i * n + (j + 1)
        # 2nd triangle of the square
        indices[quad_id * 6 + 3] = (i + 1) * n + j + 1
        indices[quad_id * 6 + 4] = i * n + (j + 1)
        indices[quad_id * 6 + 5] = (i + 1) * n + j

    for i, j in ti.ndrange(n, n):
        if (i // 4 + j // 4) % 2 == 0:
            colors[i * n + j] = (0.22, 0.72, 0.52)
        else:
            colors[i * n + j] = (1, 0.334, 0.52)

initialize_mesh_indices()

spring_offsets = []
if bending_springs:
    for i in range(-1, 2):
        for j in range(-1, 2):
            if (i, j) != (0, 0):
                spring_offsets.append(ti.Vector([i, j]))

else:
    for i in range(-2, 3):
        for j in range(-2, 3):
            if (i, j) != (0, 0) and abs(i) + abs(j) <= 2:
                spring_offsets.append(ti.Vector([i, j]))

@ti.kernel
def substep():
    for i in ti.grouped(x):
        v[i] += gravity * dt

    for i in ti.grouped(x):
        force = ti.Vector([0.0, 0.0, 0.0])
        for spring_offset in ti.static(spring_offsets):
            j = i + spring_offset
            if 0 <= j[0] < n and 0 <= j[1] < n:
                x_ij = x[i] - x[j]
                v_ij = v[i] - v[j]
                d = x_ij.normalized()
                current_dist = x_ij.norm()
                original_dist = quad_size * float(i - j).norm()
                # Spring force
                force += -spring_Y * d * (current_dist / original_dist - 1)
                # Dashpot damping
                force += -v_ij.dot(d) * d * dashpot_damping * quad_size

        v[i] += force * dt

    for i in ti.grouped(x):
        v[i] *= ti.exp(-drag_damping * dt)
        offset_to_center = x[i] - ball_center[0]
        if offset_to_center.norm() <= ball_radius:
            # Velocity projection
            normal = offset_to_center.normalized()
            v[i] -= min(v[i].dot(normal), 0) * normal
        x[i] += dt * v[i]

@ti.kernel
def update_vertices():
    for i, j in ti.ndrange(n, n):
        vertices[i * n + j] = x[i, j]

window = ti.ui.Window("Taichi Cloth Simulation on GGUI", (1024, 1024),
                      vsync=True)
canvas = window.get_canvas()
canvas.set_background_color((1, 1, 1))
scene = ti.ui.Scene()
camera = ti.ui.make_camera()

current_t = 0.0
initialize_mass_points()

while window.running:
    if current_t > 1.5:
        # Reset
        initialize_mass_points()
        current_t = 0

    for i in range(substeps):
        substep()
        current_t += dt
    update_vertices()

    camera.position(0.0, 0.0, 3)
    camera.lookat(0.0, 0.0, 0)
    scene.set_camera(camera)

    scene.point_light(pos=(0, 1, 2), color=(1, 1, 1))
    scene.ambient_light((0.5, 0.5, 0.5))
    scene.mesh(vertices,
               indices=indices,
               per_vertex_color=colors,
               two_sided=True)

    # Draw a smaller ball to avoid visual penetration
    scene.particles(ball_center, radius=ball_radius * 0.95, color=(0.5, 0.42, 0.8))
    canvas.scene(scene)
    window.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145

感兴趣的可以具体看看代码的实现过程，如果不加 taichi 库，这段代码运行起来会有点吃力，但是上了 taichi 之后，运行效果是如此丝滑！

总结

这个库是中国人发明的，它就是毕业于清华大学，后来去麻省理工学院进修的胡渊鸣

相关阅读:
ESD最常用的3种模型？|深圳比创达EMC
【2024最新华为OD-C/D卷试题汇总】[支持在线评测] 连续区间和(100分) - 三语言AC题解(Python/Java/Cpp)
【Day20】集合
 数字孪生产业园开发公司，VR钢铁效果怎么样?强荐广州华锐互动
 zlMediaKit 1 task模块--怎么用异步做到同步，怎么基于任务而非基于线程管理
 AI技术：分享8个非常实用的AI绘画网站
 Demo示例——Bundle打包和加载
 基于SpringBoot的酒店管理系统
 SORA和大语言模型的区别
 R语言绘制精美图形 | 火山图 | 学习笔记
原文地址：https://blog.csdn.net/qq_34160248/article/details/127856008