• Python爬虫打印状态码为521,返回数据为乱码?


    爬虫代码:
    1. import requests
    2. headers = {
    3. 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36',
    4. 'Referer':'https://www1.rmfysszc.gov.cn/projects.shtml?dh=3&gpstate=1&wsbm_slt=1'
    5. }
    6. form_data = {
    7. "type": "0",
    8. "name": "",
    9. "area": "河南省",
    10. "city": "河南省",
    11. "city1": "==请选择==",
    12. "city2": "==请选择==",
    13. "xmxz": "0",
    14. "state": "0",
    15. "money": "",
    16. "money1": "",
    17. "number": "0",
    18. "fid1": "",
    19. "fid2": "",
    20. "fid3": "",
    21. "order": "0",
    22. "page": "1",
    23. "include": "0"
    24. }
    25. response = requests.post('https://www1.rmfysszc.gov.cn/ProjectHandle.shtml',data=form_data,headers=headers)
    26. print(response.status_code)
    27. print(response.text)
    打印返回结果:

    后来打印状态码发现是521?
    解决办法:
    爬虫代码headers中添加网页cookie
    注意:如何已添加cookie,出现断网情况,需要重新获取cookie
    修改后的代码:
    1. import requests
    2. headers = {
    3. 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36',
    4. 'Referer':'https://www1.rmfysszc.gov.cn/projects.shtml?dh=3&gpstate=1&wsbm_slt=1',
    5. 'Cookie':'Cookies-01=78968004; ASP.NET_SessionId=kscnfewrtce2cj1gd3y1oefd; __jsluid_s=3626e7c26101665bc4ae1157ce1dbf7b; Hm_lvt_5698cdfa8b95bb873f5ca4ecf94ac150=1709957130; __jsl_clearance_s=1709957570.246|0|hWDDOOVhkDy6L7BtCygGoT9x5YE%3D; Hm_lpvt_5698cdfa8b95bb873f5ca4ecf94ac150=1709957572'
    6. }
    7. form_data = {
    8. "type": "0",
    9. "name": "",
    10. "area": "河南省",
    11. "city": "河南省",
    12. "city1": "==请选择==",
    13. "city2": "==请选择==",
    14. "xmxz": "0",
    15. "state": "0",
    16. "money": "",
    17. "money1": "",
    18. "number": "0",
    19. "fid1": "",
    20. "fid2": "",
    21. "fid3": "",
    22. "order": "0",
    23. "page": "1",
    24. "include": "0"
    25. }
    26. response = requests.post('https://www1.rmfysszc.gov.cn/ProjectHandle.shtml',data=form_data,headers=headers)
    27. print(response.status_code)
    28. print(response.text)
    打印结果:

  • 相关阅读:
    【MATLAB教程案例37】语音信号的端点检测方法matlab仿真学习——ZCR过零法,双门限法
    Allegro如何查看器件的管脚号?
    已解决 TypeError: Fetch argument None has invalid type <class ‘NoneType‘>
    干货丨如何开启TiDB集群中的节点通信加密?
    《探索虚拟与现实的边界:VR与AR谁更能引领未来?》
    JavaScript倒计时
    PyTorch开发者福音, OpenVINO整合PyTorch实现推理加速!
    NLP经典论文研读--transformer-XL论文源码难点记录
    前端网页打开本地应用程序
    洛谷P2065 [TJOI2011] 卡片
  • 原文地址:https://blog.csdn.net/m0_74972727/article/details/136581529