对于同步、异步的基础与基本的书写方法请参考上一篇博客:
python协整与异步调用,压榨程序的摸鱼时间——异步改写一般程序(1)
由于request本身只能发送同步的请求,因此我们使用支持异步的httpx来访问网站,对比同步与异步的写法,用同步与异步的方法循环访问一个网站列表
啊~没错,异步操作需要函数/三方库本身支持异步,这就是为什么上一节中使用await asyncio.sleep(1),而不是await time.sleep(1)
对于一个需要循环执行的程序(如下面的代码)
from datetime import datetime
import httpx
def get_url(url):
headers = {"User-Agent": "User-Agent:Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0;"}
req_result = httpx.get(url, headers=headers)
print('GET URL {} , 时间 {}'.format(url, datetime.now()))
if req_result.status_code == 200:
return req_result.text[:20] # 只返回前20个字符
else:
return None
def main():
start_time = datetime.now()
url_list = [
"https://www.baidu.com",
"https://www.github.com",
"https://www.bilibili.com",
"https://zhuanlan.zhihu.com/"
]
# 得到返回的数据
result_dict = {}
for _url in url_list:
result_dict[_url] = get_url(_url)
end_time = datetime.now()
print("耗时:", end_time - start_time)
print(result_dict)
if __name__ == '__main__':
main()
执行结果如下:
GET URL https://www.baidu.com , 时间 2022-07-28 08:44:17.898791
GET URL https://www.github.com , 时间 2022-07-28 08:44:19.001649
GET URL https://www.bilibili.com , 时间 2022-07-28 08:44:19.925008
GET URL https://zhuanlan.zhihu.com/ , 时间 2022-07-28 08:44:20.156802
耗时: 0:00:03.159291
{'https://www.baidu.com': '"User-Agent": "User-Agent:Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0;"}
req = await client.get(url, headers=headers)
print('GET URL {} , 时间 {}'.format(url, datetime.now()))
if req.status_code == 200:
return req.text[:20]
else:
return None
def main():
start_time = datetime.now()
url_list = [
"https://www.baidu.com",
"https://www.github.com",
"https://www.bilibili.com",
"https://zhuanlan.zhihu.com/"
]
# 异步循环
tasks = []
loop = asyncio.get_event_loop()
for _url in url_list:
tasks.append(loop.create_task(get_by_request(_url)))
loop.run_until_complete(asyncio.wait(tasks))
# 得到返回数据
result_dict = {}
for index, _tk in enumerate(tasks):
result_dict[url_list[index]] = _tk.result()
end_time = datetime.now()
print("耗时:", end_time - start_time)
print(result_dict)
if __name__ == '__main__':
main()
得到的结果:
GET URL https://zhuanlan.zhihu.com/ , 时间 2022-07-28 08:47:18.548526
GET URL https://www.baidu.com , 时间 2022-07-28 08:47:18.738401
GET URL https://www.github.com , 时间 2022-07-28 08:47:19.293389
GET URL https://www.bilibili.com , 时间 2022-07-28 08:47:19.369106
耗时: 0:00:01.224891
{'https://www.baidu.com': '}
for index, _tk in enumerate(tasks):
result_dict[url_list[index]] = _tk.result()