• Python异步爬虫(aiohttp版)


    异步协程不太了解的话可以去看我上篇博客:https://www.cnblogs.com/Red-Sun/p/16934843.html
    PS:本博客是个人笔记分享,不需要扫码加群或必须关注什么的(如果外站需要加群或关注的可以直接去我主页查看)
    欢迎大家光临ヾ(≧▽≦*)o我的博客首页https://www.cnblogs.com/Red-Sun/

    1.requests请求

    # -*- coding: utf-8 -*-
    # @Time    : 2022/12/6 16:03
    # @Author  : 红后
    # @Email   : not_enabled@163.com
    # @blog    : https://www.cnblogs.com/Red-Sun
    # @File    : 实例1.py
    # @Software: PyCharm
    import aiohttp, asyncio
    
    
    async def aiohttp_requests(url):  # aiohttp的requests函数
        async with aiohttp.request("GET", url=url) as response:
            return await response.text(encoding='UTF-8')
    
    
    async def main():  # 主函数用于异步函数的启动
        url = 'https://www.baidu.com'
        html = await aiohttp_requests(url)  # await修饰异步函数
        print(html)
    
    
    if __name__ == '__main__':
        loop = asyncio.get_event_loop()
        loop.run_until_complete(main())
    
    

    2.session请求

    GET:

    # -*- coding: utf-8 -*-
    # @Time    : 2022/12/6 16:33
    # @Author  : 红后
    # @Email   : not_enabled@163.com
    # @blog    : https://www.cnblogs.com/Red-Sun
    # @File    : 实例2.py
    # @Software: PyCharm
    import aiohttp, asyncio
    
    
    async def aiohttp_requests(url):  # aiohttp的requests函数
        async with aiohttp.ClientSession() as session:  # 声明了一个支持异步的上下文管理器
            async with session.get(url) as response:
                return await response.text(encoding='UTF-8')
    
    
    async def main():  # 主函数用于异步函数的启动
        url = 'https://www.baidu.com'
        html = await aiohttp_requests(url)  # await修饰异步函数
        print(html)
    
    
    if __name__ == '__main__':
        loop = asyncio.get_event_loop()
        loop.run_until_complete(main())
    


    其中aiohttp还有post,put, delete...等一系列请求(PS:一般情况下只需要创建一个session,然后使用这个session执行所有的请求。)
    PSOT:传参

    async def aiohttp_requests(url):  # aiohttp的requests函数
        async with aiohttp.ClientSession() as session:
            data = {'key': 'value'}
            async with session.post(url=url, data=data) as response:
                return await response.text(encoding='UTF-8')
    

    PS:这种传参传递的数据将会被转码,如果不想被转码可以直接提交字符串data=str(data)

    附:关于session请求数据修改操作

    1.cookies

    自定义cookies应该放在ClientSession中,而不是session.get()中

    async def aiohttp_requests(url):  # aiohttp的requests函数
        cookies = {'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'}
        async with aiohttp.ClientSession(cookies=cookies) as session:
            async with session.get(url) as response:
                return await response.text(encoding='UTF-8')
    

    2.headers

    放在自定义的headers跟正常的requests一样放在session.get()中

    async def aiohttp_requests(url):  # aiohttp的requests函数
        async with aiohttp.ClientSession() as session:
            headers = {'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'}
            async with session.get(url=url, headers=headers) as response:
                return await response.text(encoding='UTF-8')
    

    3.timeout

    默认响应时间为5分钟,通过timeout可以重新设定,其放在session.get()中

    async def aiohttp_requests(url):  # aiohttp的requests函数
        async with aiohttp.ClientSession() as session:
            async with session.get(url=url, timeout=60) as response:
                return await response.text(encoding='UTF-8')
    

    4.proxy

    当然代理也是支持的在session.get()中配置

    async def aiohttp_requests(url):  # aiohttp的requests函数
        async with aiohttp.ClientSession() as session:
            async with session.get(url=url, proxy="http://some.proxy.com") as response:
                return await response.text(encoding='UTF-8')
    

    需要授权的代理

    async def aiohttp_requests(url):  # aiohttp的requests函数
        async with aiohttp.ClientSession() as session:
            proxy_auth = aiohttp.BasicAuth('user', 'pass')  # 用户,密码
            async with session.get(url=url, proxy="http://some.proxy.com", proxy_auth=proxy_auth) as response:
                return await response.text(encoding='UTF-8')
    

    或者

    async def aiohttp_requests(url):  # aiohttp的requests函数
        async with aiohttp.ClientSession() as session:
            async with session.get(url=url, proxy='http://user:pass@some.proxy.com') as response:
                return await response.text(encoding='UTF-8')
    

    报错处理

    错误:RuntimeError: Event loop is closed


    报错原因是使用了asyncio.run(main())来运行程序
    看到别个大佬的总结是asyncio.run()会自动关闭循环,并且调用_ProactorBasePipeTransport.__del__报错, 而asyncio.run_until_complete()不会。
    第一种解决方法换成如下代码运行

    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())
    
    

    第二种重写方法以保证run()的运行

    from functools import wraps
    
    from asyncio.proactor_events import _ProactorBasePipeTransport
    
    def silence_event_loop_closed(func):
        @wraps(func)
        def wrapper(self, *args, **kwargs):
            try:
                return func(self, *args, **kwargs)
            except RuntimeError as e:
                if str(e) != 'Event loop is closed':
                    raise
        return wrapper
    
    _ProactorBasePipeTransport.__del__ = silence_event_loop_closed(_ProactorBasePipeTransport.__del__)
    

    __EOF__

  • 本文作者: 红 后
  • 本文链接: https://www.cnblogs.com/Red-Sun/p/16956032.html
  • 关于博主: 评论和私信会在第一时间回复。或者直接私信我。
  • 版权声明: 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
  • 声援博主: 如果您觉得文章对您有帮助,可以点击文章右下角推荐一下。
  • 相关阅读:
    计算机毕业设计(附源码)python疫情状态下的图书馆座位预约系统
    Map集合中,当添加一个键值对元素时,HashMap发生了什么?
    sudo in Linux
    java变量的作用域
    431. 将 N 叉树编码为二叉树 DFS
    软件项目验收测试范围和流程,这些你都知道吗?
    谷歌宣布:今年将Android 12L系统交付于三星、联想和微软
    【操作系统——内存基本分页存储管理】
    软件测试用里篇
    Windows下Python安装FDFS-client错误问题
  • 原文地址:https://www.cnblogs.com/Red-Sun/p/16956032.html