点击上方“Python爬虫与数据挖掘”,进行关注
回复“书籍”即可获赠Python从入门到进阶共10本电子书
今
日
鸡
汤
今当远离,临表涕零,不知所言。
Typora导出的PDF目录标题自动加编号
在Typora主题文件夹增加如下文件后,标题便自动加上了编号:
https://gitcode.net/as604049322/blog_data/-/blob/master/base.user.css
例如:
但是导出的PDF中,目录却没有编号:
这是我使用Python处理该文件,使其具有编号,完整代码如下:
- # 博客地址:https://blog.csdn.net/as604049322
- __author__ = '小小明-代码实体'
- __date__ = '2023/8/31'
-
- from PyPDF2 import PdfReader, PdfWriter
-
-
- def get_pdf_Bookmark(filename):
- "作者CSDN:https://blog.csdn.net/as604049322"
- if isinstance(filename, str):
- pdf_reader = PdfReader(filename)
- else:
- pdf_reader = filename
- pagecount = len(pdf_reader.pages)
- # 用保存每个标题id所对应的页码
- idnum2pagenum = {}
- for i in range(pagecount):
- page = pdf_reader.pages[i]
- idnum2pagenum[page.indirect_ref.idnum] = i
- # 保存每个标题对应的标签数据,包括层级,标题和页码索引(页码-1)
- bookmark = []
-
- def get_pdf_Bookmark_inter(outlines, tab=0):
- for outline in outlines:
- if isinstance(outline, list):
- get_pdf_Bookmark_inter(outline, tab + 1)
- else:
- bookmark.append(
- (tab, outline['/Title'], idnum2pagenum[outline.page.idnum]))
-
- get_pdf_Bookmark_inter(pdf_reader.outline)
- return bookmark
-
-
- def pdf_write_bookmark(bookmark, pdf_file, compress=True):
- pdf_reader = PdfReader(pdf_file)
- num_pages = len(pdf_reader.pages)
- pdf_writer = PdfWriter()
- for page in pdf_reader.pages:
- if compress:
- page.compress_content_streams()
- pdf_writer.add_page(page)
- # pdf_reader.
- last_cache = [None] * (max(bookmark, key=lambda x: x[0])[0] + 1)
- for tab, title, pagenum in bookmark:
- if pagenum >= num_pages:
- continue
- parent = last_cache[tab - 1] if tab > 0 else None
- indirect_id = pdf_writer.add_outline_item(title, pagenum, parent=parent)
- last_cache[tab] = indirect_id
- pdf_writer.page_mode = "/UseOutlines"
- with open(pdf_file, "wb") as out:
- pdf_writer.write(out)
- print("已成功将书签写入到", pdf_file)
-
-
- if __name__ == '__main__':
- file = r"C:\Users\sj\Desktop\集团管理层培训.pdf"
- bookmark = get_pdf_Bookmark(file)
- num_cache = [0] * 7
- last_tab = 0
- new_bookmark = []
- for tab, title, pagenum in bookmark:
- if tab > last_tab:
- num_cache[tab] = 1
- else:
- num_cache[tab] += 1
- new_title = title
- if not title[0].isdigit():
- new_title = ".".join(map(str, num_cache[:tab + 1])) + " " + title
- # print(tab, new_title, pagenum)
- new_bookmark.append((tab, new_title, pagenum))
- last_tab = tab
- pdf_write_bookmark(new_bookmark, file)
处理后的PDF目录就有编号了:
小伙伴们,快快用实践一下吧!如果在学习过程中,有遇到任何问题,欢迎加我好友,我拉你进Python学习交流群共同探讨学习。

------------------- End -------------------
往期精彩文章推荐:

欢迎大家点赞,留言,转发,转载,感谢大家的相伴与支持
想加入Python学习群请在后台回复【入群】
万水千山总是情,点个【在看】行不行
/今日留言主题/
随便说一两句吧~~