【Python 技能树共建】requests-html 库初识

requests-html 模块是什么

requests-html 模块安装使用 pip install requests-html 即可，官方并没有直接的中文翻译，在检索过程中，确实发现了一版中文手册，在文末提供。

先看一下官方对该库的基本描述：

Full JavaScript support!（完全支持 JS，这里手册还重点标记了一下，初学阶段可以先忽略）
CSS Selectors (a.k.a jQuery-style, thanks to PyQuery).（集成了 pyquery 库，支持 css 选择器）
XPath Selectors, for the faint at heart.（支持 XPath 选择器）
Mocked user-agent (like a real web browser).（mock UA 数据，这点不错）
Automatic following of redirects.（自动跟踪重定向）
Connection–pooling and cookie persistence.（持久性 COOKIE）
The Requests experience you know and love, with magical parsing abilities.（额，这最后一点，各位自己领悟吧）

Only Python 3.6 is supported. 仅支持 Python 3.6 ，实测发现 3.6 以上版本依旧可以。

对于该库的简单使用，代码如下所示：


from requests_html import HTMLSession
session = HTMLSession()
 
r = session.get('https://python.org/')
 
print(r)

首先从 requests_html 库导入 HTMLSession 类，然后将其实例化之后，调用其 get 方法，发送请求，得到的 r 输出为，后续即可使用内置的解析库对数据进行解析。

由于该库是解析 html 对象，所以可以查看对应的 html 对象包含哪些方法与与属性。

通过 dir 函数查阅。


print(dir(r.html))
# 输出如下内容：
['__aiter__', '__anext__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__',
'__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__module__', '__ne__',
'__new__', '__next__', '__reduce__'

相关阅读:
Python3 面向对象，一篇就够了
iOS 中，Atomic 修饰 NSString、 NSArray，也会线程不安全
西安凯新（CAS：2408831-65-0）Biotin-PEG4-Acrylamide 特性
信创产业多点开花，AntDB数据库积极参与行业标准研制，协同价值链伙伴共促新发展
优雅的使用Validated
简单学习GoogleColab的入门级概念
校招失败后，在小公司熬了 2 年终于进了华为，竭尽全力....
10万字208道Java经典面试题总结(附答案）
Windows和Linux环境中安装Zookeeper具体操作
Mybatis中使用了哪些设计模式

原文地址：https://blog.csdn.net/AudiA6LV6/article/details/126957350