• Python爬虫打印状态码为521,返回数据为乱码?


    爬虫代码:
    1. import requests
    2. headers = {
    3. 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36',
    4. 'Referer':'https://www1.rmfysszc.gov.cn/projects.shtml?dh=3&gpstate=1&wsbm_slt=1'
    5. }
    6. form_data = {
    7. "type": "0",
    8. "name": "",
    9. "area": "河南省",
    10. "city": "河南省",
    11. "city1": "==请选择==",
    12. "city2": "==请选择==",
    13. "xmxz": "0",
    14. "state": "0",
    15. "money": "",
    16. "money1": "",
    17. "number": "0",
    18. "fid1": "",
    19. "fid2": "",
    20. "fid3": "",
    21. "order": "0",
    22. "page": "1",
    23. "include": "0"
    24. }
    25. response = requests.post('https://www1.rmfysszc.gov.cn/ProjectHandle.shtml',data=form_data,headers=headers)
    26. print(response.status_code)
    27. print(response.text)
    打印返回结果:

    后来打印状态码发现是521?
    解决办法:
    爬虫代码headers中添加网页cookie
    注意:如何已添加cookie,出现断网情况,需要重新获取cookie
    修改后的代码:
    1. import requests
    2. headers = {
    3. 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36',
    4. 'Referer':'https://www1.rmfysszc.gov.cn/projects.shtml?dh=3&gpstate=1&wsbm_slt=1',
    5. 'Cookie':'Cookies-01=78968004; ASP.NET_SessionId=kscnfewrtce2cj1gd3y1oefd; __jsluid_s=3626e7c26101665bc4ae1157ce1dbf7b; Hm_lvt_5698cdfa8b95bb873f5ca4ecf94ac150=1709957130; __jsl_clearance_s=1709957570.246|0|hWDDOOVhkDy6L7BtCygGoT9x5YE%3D; Hm_lpvt_5698cdfa8b95bb873f5ca4ecf94ac150=1709957572'
    6. }
    7. form_data = {
    8. "type": "0",
    9. "name": "",
    10. "area": "河南省",
    11. "city": "河南省",
    12. "city1": "==请选择==",
    13. "city2": "==请选择==",
    14. "xmxz": "0",
    15. "state": "0",
    16. "money": "",
    17. "money1": "",
    18. "number": "0",
    19. "fid1": "",
    20. "fid2": "",
    21. "fid3": "",
    22. "order": "0",
    23. "page": "1",
    24. "include": "0"
    25. }
    26. response = requests.post('https://www1.rmfysszc.gov.cn/ProjectHandle.shtml',data=form_data,headers=headers)
    27. print(response.status_code)
    28. print(response.text)
    打印结果:

  • 相关阅读:
    API 自动化测试难点总结与分享
    4. Java IO
    Python基础set集合定义与函数
    html给下拉框添加搜索、分页功能(通过ajax从服务器获取搜索数据)
    鸿蒙应用开发初尝试《创建项目》,之前那篇hello world作废
    Numpy(一)简介与基本使用
    JavaSE之注解
    【滤波器设计】微波带低通高通带通滤波器设计【含Matlab源码 2217期】
    【尚硅谷React】——React全家桶笔记
    软考 --- 数据库(3)数据操作
  • 原文地址:https://blog.csdn.net/m0_74972727/article/details/136581529