


test.html 中的代码
- html>
- <html lang="en">
- <head>
- <meta charset="UTF-8">
- <title>Titletitle>
- head>
- <body>
- <h1>标题1h1>
- <h2>标题2h2>
- <h3>标题3h3>
- <h4>标题4h4>
-
- <div id="content" class="default">
- <p>段落p>
- <a href="http://www.baidu.com">百度a> <br/>
- <a href="http://www.iqiyi.com">爱奇艺a> <br/>
- <img src="https://www.python.org/static/img/python-logo.png" />
- div>
- body>
- html>
test.py 中的代码
- from bs4 import BeautifulSoup
-
- with open('./test.html', encoding='utf-8') as f:
- html_doc = f.read()
-
- soup = BeautifulSoup(html_doc, 'html.parser')
-
- div_node = soup.find('div', id='content')
- print(div_node)
- print('='*20)
-
- links = div_node.find_all('a')
- for link in links:
- print(link.name, link['href'], link.get_text())
-
- img = div_node.find('img')
- print(img['src'])
代码运行结果
