• 常规动态网页爬取


    1.抓取动态网页“http://www.ptpress.com.cn”内容,将新书推荐中生活板块的书籍书名、价格和作者爬取并保存。

    1. import requests
    2. import json
    3. import openpyxl
    4. url = 'https://www.ptpress.com.cn/recommendBook/getRecommendBookListForPortal?bookTagId=d5cbb56d-09ef-41f5-9110-ced741048f5f'
    5. headers = {
    6. 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
    7. '(KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36 Edg/95.0.1020.44',
    8. 'Cookie':'gr_user_id=796019e3-dc58-40f5-a6df-892a38008bcd; '
    9. 'acw_tc=2760822416373059896443147efcf3dd457a5539d63a07fdafd12f3041cd93; '
    10. 'JSESSIONID=A0FD72E84771D06417CF145392DAA679; '
    11. 'gr_session_id_9311c428042bb76e=1a1d8cc2-0de9-4409-adc4-07de4cdb503f;'
    12. ' gr_session_id_9311c428042bb76e_1a1d8cc2-0de9-4409-adc4-07de4cdb503f=true'
    13. }
    14. text_json = requests.get(url=url,headers=headers)
    15. res = json.loads(text_json.content)
    16. def save_execl(res):
    17. wb1 = openpyxl.Workbook()
    18. sheet = wb1.active
    19. sheet.title = "人民邮电新书推荐"
    20. title = ['书名', '作者', '价格']
    21. sheet.append(title)
    22. for re in res['data']:
    23. author, discountPrice = json_detail(re['bookId'])
    24. sheet.append([re['bookName'], author, discountPrice])
    25. wb1.save('生活类新书基本信息.xlsx')
    26. def json_detail(bookid):
    27. url = 'https://www.ptpress.com.cn/bookinfo/getBookDetailsById'
    28. bookid = bookid
    29. params = {
    30. 'bookId': bookid,
    31. }
    32. text_json = requests.post(url=url, headers=headers, params=params)
    33. res = json.loads(text_json.content)['data']
    34. author = res['author']
    35. discountPrice = res['discountPrice']
    36. print(res['bookName'], author, discountPrice)
    37. return author, discountPrice
    38. save_execl(res)

    爬取结果:

  • 相关阅读:
    27.阻塞队列
    一文搞懂│mysql 中的备份恢复、分区分表、主从复制、读写分离
    PostgreSQL 流复制搭建与维护
    使用C语言实现各种排序(总结)
    Linux基础知识总结篇
    股票接口的推出对于散户有哪些意义?
    ArrayList 源码解析(JDK1.8)
    JVM(二十一)—— 垃圾回收器(一)
    0047__Verilog语法入门
    新媒体运营的营销方案
  • 原文地址:https://blog.csdn.net/m0_74972727/article/details/133881511