bs4全称:beautifulsoup4,意思为美丽的汤版本4。
可以在HTML或XML⽂件中提取数据的⽹⻚信息提取库。
bs4是依赖lxml库的,只有先安装lxml库才可以安装bs4库。
命令:
pip install bs4 -i https://pypi.douban.com/simple/
(1)创建⼀个soup对象
- from bs4 import BeautifulSoup
-
- html_doc = """
-
The Dormouse's story -
The Dormouse's story
-
-
Once upon a time there were three little sisters; and their names were
- Lacie and
- and they lived at the bottom of a well.
-
-
...
- """
- # print(html_doc)
-
- # 解析BeautifulSoup对象
- soup = BeautifulSoup(html_doc)
- print(soup)