python将pdf转为txt

import PyPDF2
pdffile=").pdf"
txtfile="(1).txt"
with open(pdffile,"rb") as pdf:
    reader=PyPDF2.PdfReader(pdf)
    text = "".join(page.extract_text() for page in reader.pages)
    with open(txtfile,'w',encoding = 'utf-8') as txt:
        txt.write(text)
1
2
3
4
5
6
7
8

批量转换

import os
import PyPDF2
import re

pdf_path = '.\数据PDF'

txt_path = '.\数据TXT'

pdflists = os.listdir(pdf_path)

for pdflist in pdflists:
    
    pdffile = pdf_path + '\\' + pdflist
    
    txtfile = txt_path + '\\' + str(re.findall('(.+).pdf',pdflist)[0]) + '.txt'
    print(txtfile)
    with open(pdffile,"rb") as pdf:
        reader=PyPDF2.PdfReader(pdf)
        text = "".join(page.extract_text() for page in reader.pages)
        with open(txtfile,'w',encoding = 'utf-8') as txt:
            txt.write(text)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

相关阅读:
图解设计模式：身份认证场景的应用
游戏中交通工具的设计
GPT实战系列-ChatGLM3本地部署CUDA11+1080Ti+显卡24G实战方案
SQLite 安装和 Java 使用教程
MySQL | 存储《康师傅MySQL从入门到高级》笔记
Rust的智能指针--RefCell＜T＞
java毕业设计项目struts实现的图书馆管理系统|图书借阅[包运行成功]
OpenCV C++ 图像批处理 (批量调整尺寸、批量重命名）
grpc实现跨语言(go与java)服务通信
Spring MVC使用JSON的过程与步骤

原文地址：https://blog.csdn.net/qq_42830971/article/details/133828408