• 牛!playwright 一行命令一键把html网页保存为pdf,太方便了!


    前言

    今天介绍将HTML网页抓取下来,然后以PDF保存,也可以将自己写好生成的HTML转成PDF。废话不多说直接进入教程。

    1. 安装

    Conda安装:
    conda config --add channels conda-forge
    conda config --add channels microsoft
    conda install playwright
    playwright install
    
    • 1
    • 2
    • 3
    • 4
    Pip安装:
    # 安装playwright
    pip install playwright
    # 安装browser驱动
    playwright install
    
    • 1
    • 2
    • 3
    • 4

    2. 利用playwright 将page保存为pdf格式的用法

    playwright pdf html文件路径 PDF输出路径
    
    • 1
    playwright pdf ./baidu.html ./baidu.pdf
    
    • 1
    • –viewport-size选项生成不同大小的窗口
      playwright pdf --viewport-size=800,600 ./baidu.html ./baidu.pdf
      
      • 1
    • 模拟地理位置、语言和时区
      playwright pdf --timezone="Asia/Shanghai" --geolocation="30.890221,120.492348" --lang="zh-CN" ./baidu.html ./baidu.pdf
      
      • 1

    3. Python playwright将网页批量保存pdf文件

    from playwright.sync_api import sync_playwright
    
    # 获取要保存的 URL 列表
    urls = []
    with open('urls.txt', mode='rt', encoding='utf-8') as f:
        urls = f.readlines()
    
    # 访问目标 URL 列表并另存为 PDF
    with sync_playwright() as p:
        browser = p.chromium.launch()
        for i,url in enumerate(urls):
            context = browser.new_context()
            page = context.new_page()
            page.goto(url)
            page.pdf(path=f"{i}.pdf")
        browser.close()
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
  • 相关阅读:
    Java学习笔记 --- 多线程
    ES6中 字符串的方法
    08-Express路由详解
    院内导航怎么实现?哪家技术好?医院导航移动导诊服务系统
    Me-and-My-Girlfriend-1
    fpga图像处理------常用算法(二)
    【JavaScript】axios
    react-router-dom6学习11-如何使用路由监听上
    JWT快速上手 | 黑马
    ES10 新特性
  • 原文地址:https://blog.csdn.net/qq_37275405/article/details/133851501