爬虫小案例爬取塔某小说内容

单章小说下载：


import requests
import parsel
import re
#https://www.tadu.com/getPartContentByCodeTable/1004090/2 第二章链接
def get_response(url):
    headers = {
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    }
    response = requests.get(url=url,headers=headers)
 
    return response
if __name__ == '__main__':
    url = 'https://www.tadu.com/getPartContentByCodeTable/1004090/1'
    response = get_response(url).json()
    content = response['data']['content']
    result = re.findall('(.*?)
',content)
    string = '\n'.join(result)
    print(string)
    with open('你那近三十的心酸.txt','w',encoding='utf-8') as f:
        f.write(string)

结果展现：

整本小说下载，下载第二章只要在url最后改成2即可

实现搜索界面展现：


def get_search(url,key):
    data = {
        'query': key,
    }
    headers = {
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    }
    response = requests.post(url=url,data=data,headers=headers)
    return response
 
def get_booklist(response):
    selector = parsel.Selector(text=response)
    ul_list = selector.css('.bookList li')
    for ul in ul_list:
        href = ul.css('div.rtList a::attr(href)').get()
        title = ul.css('div.rtList a.bookNm::text').getall()
        title = ''.join(title)
        info = ul.css('div.rtList a.bookIntro::text').getall()
        print(href,title,info)
if __name__ == '__main__':
    url = 'https://www.tadu.com/search'
    key = '仙魔'
    resp = get_search(url,key).text
    # print(resp)
    get_booklist(resp)

结果展现：

可以通过搜索姐界面找到小说名称和小说链接，小说简介。

相关阅读:
JavaScript学习笔记
如何用度量数据驱动代码评审的改善
乐财业：打造财税服务的“硬核“竞争力
嵌入式分享合集63
Layui之动态生成选项卡与用户CRUD
「React深入」一文吃透虚拟DOM和diff算法
Linux shell printf命令小例子
【MHA】MySQL高可用MHA介绍8-常见错误以及解决
酷开科技丨酷开系统9.2：引领大屏智能化新纪元
全网最牛的pytest从0到1全套教程-pytest(2)-pytest-html测试报告

原文地址：https://blog.csdn.net/m0_57265868/article/details/139098790