网页解析器 —— beautiful soup

基本介绍

基本使用

简单案例

test.html 中的代码


html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Titletitle>
head>
<body>
    <h1>标题1h1>
    <h2>标题2h2>
    <h3>标题3h3>
    <h4>标题4h4>
 
    <div id="content" class="default">
        <p>段落p>
        <a href="http://www.baidu.com">百度a> <br/>
        <a href="http://www.iqiyi.com">爱奇艺a> <br/>
        <img src="https://www.python.org/static/img/python-logo.png" />
    div>
body>
html>

test.py 中的代码


from bs4 import BeautifulSoup
 
with open('./test.html', encoding='utf-8') as f:
    html_doc = f.read()
 
soup = BeautifulSoup(html_doc, 'html.parser')
 
div_node = soup.find('div', id='content')
print(div_node)
print('='*20)
 
links = div_node.find_all('a')
for link in links:
    print(link.name, link['href'], link.get_text())
 
img = div_node.find('img')
print(img['src'])

代码运行结果

相关阅读:
设计公司网站怎样才能提升颜值？
Mac配置python wind量化接口
WebMvcConfigurer配置详解
2023东北电力大学计算机考研信息汇总
YouTube深度学习视频推荐系统
小程序运行机制分析
NER中BiLSTM-CRF解读Forward_algorithm
产品设计小技能
电力行业放大招了！赶紧学起来
前端面试题整理（一）

原文地址：https://blog.csdn.net/2301_77659011/article/details/132998594