Rust语言作为一种强调性能、安全和并发性的新的编程语言,正日益受到程序员们的关注。Rust语言已经连续7年蝉联 StackOverflow网站(全球最大的编程问答网站)最受欢语言。甚至Linus Torvalds认为Rust即将成为Linux的官方第二语言。有理由相信越来越多的程序员将加入尝试学习Rust。但Rust语言的学习曲线比较陡峭,门槛不低。因此,达坦科技的联合创始人兼CTO施继成将自己学习和运用Rust语言的心得体会集结成书,我们也将在达坦科技(DatenLord)公众号陆续连载。
这些思想的火花将不同于市面上其他关于学习Rust编程语言的教科书,它更多地将向程序员分享学习Rust语言的基本要义,以及在实际使用场景下如何运用Rust解决问题的思考,从而让Rust真正变成一种活生生的、有呼吸的、有用的语言。
本文是Rust You Don’t Know的第一章。
Before we start discussing the asynchronization of Rust, we'd better firstly talk about how the operating system organizes and schedules the tasks, which will help us understand the motivation of the language-level asynchronization mechanisms.
People always want to run multiple tasks simultaneously on the OS even though there's only one CPU core because one task usually can't occupy the whole CPU core at most times. Following the idea, we have to answer two questions to get the final design, how to abstract the task and how to schedule the tasks on the hardware CPU core.
Usually, we don't want tasks to affect each other, which means they can run separately and manage their states. As states are stored in the memory, tasks must hold their own memory space to achieve the above goal. For instance, the execution flow is a kind of in-memory state, recording the current instruction position and the on-stack states. In one word, processes are tasks having separated memory spaces on Linux.
Though memory space separation is one of the key features of processes, they sometimes have to share some memory. First, the kernel code is the same across all processes, kernel part memory space sharing reduces unnecessary memory redundant. Secondly, processes need to cooperate so that inter-process communications (IPC) are unavoidable, and most high-performance IPCs are some kind of memory sharing/transferring. Considering the above requirements sharing the whole memory space across tasks is more convenient in some scenarios, where thread helps.
A process can contain one (single-thread process) or more threads. Threads in a process share the same memory space, which means most state changes are observable by all these threads except for the execution stacks. Each thread has its execution flow and can run on any CPU core concurrently.
Now we know that process and thread are the basic execution units/tasks on most OSes, let's try to run them on the real hardware, CPU cores.
The first challenge we meet when trying to run processes and threads is the limited hardware resources, the CPU core number is limited. When I write this section, one x86 CPU can at most run 128 tasks at the same time, AMD Ryzen™ Threadripper™ PRO 5995WX Processor. But it's too easy to create thousands of processes or threads on Linux, we have to decide how to place them on the CPU core and when to stop a task, where OS task scheduler helps.
Schedulers can interrupt an executing task regardless of its state, and schedule a new one. It's called preemptive schedule and is used by most OSes like Linux. The advantage is that it can share the CPU time slice between tasks fairly no matter what they're running, but the tasks have no idea about the scheduler. To interrupt a running task, hardware interruption like time interruption is necessary.
The other schedulers are called non-preemptive schedulers, which have to cooperate with the task while scheduling. Here tasks are not interrupted, instead, they decide when to release the computing resource. The tasks usually schedule themselves out when doing I/O operations, which usually take a while to complete. Fairness is hard to be guaranteed as the task itself may run forever without stopping, in which case other tasks have no opportunity to be scheduled on that core.
No matter what kind of scheduler is taken, tasks scheduling always needs to do the following steps:
After adopting a scheduler operating system can run tens of thousands of processes/threads on the limited hardware resource.
We have basic knowledge of OS scheduling, and it seems to work fine in most cases. Next, let's see how it performs in extreme scenarios. Free software developer, Jim Blandy, did an interesting test to show how much time it takes to do a context switch on Linux. In the test, the app creates 500 thread and connect them with pipes like a chain, and then pass a one-byte message from one side to the other side. The whole test runs 10000 iterations to get a stable result. The result shows that a thread context switch takes around 1.7us, compared to 0.2us of a Rust async task switch.
It's the first time to mention "Rust async task", which is a concrete implementation of coroutine in Rust. The coroutines are lightweight tasks for non-preemptive multitasking, whose execution can be suspended and resumed. Usually, the task itself decides when to suspend and wait for a notification to resume. To suspend and resume tasks' execution flow, the execution states should be saved, just like what OS does. Saving the CPU register values is easy for the OS, but not for the applications. Rust saves it to a state machine, and the machine can only be suspended and resumed from the valid states in that machine. To make it easy, We name the state machine "Future".
We all know that the Future
is the data structure returned from an async function, an async block is also a future. When we get it, it does nothing, it's just a plan and a blueprint, telling us what it's going to do. Let's see the example below:
- async fn async_fn() -> u32 {
- return 0;
- }
-
We can't see any "Future" structure in the function definition, but the compiler will translate the function signature to another one returning a "Future":
- fn async_fn() -> Future<Output=u32> {
- ...
- }
Rust compiler does us a great favor to generate the state machine for us. Here's the Futures API from std lib:
- pub trait Future {
- type Output;
-
- fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
- }
-
- pub enum Poll
{ - Ready(T),
- Pending,
- }
The poll function tries to drive the state machine until a final result Output is returned. The state machine is a black box for the caller of the poll function, since that Poll::Pending means it's not in the final state, and Poll::Ready(T) means it's in the final state. Whenever the Poll::Pending is returned it means the coroutine is suspended. Every call to poll is trying to resume the coroutine.
Since Future
s are state machines, there should be a driver that pushes the machine state forward. Though we can write the driver manually by poll
ing the Future
s one by one until we get the final result, that work should be done once and reused everywhere, in the result the runtime
comes. A Rust async runtime handles the following tasks:
Future
s forward.Future
s.Future
s.In this chapter, we learned that "Rust async" is a way to schedule tasks. And the execution state is stored in a state machine named Future
. In the next chapters, we'll discuss Future
automatical generation by the compiler and its optimizations.