day24-selenium基本操作

选项卡的切换（中国知网为例）

from selenium.webdriver import Chrome
from bs4 import BeautifulSoup
import time
b = Chrome()
# 1.打开中国知网
b.get('http://kns.cnki.net/')

# 2.输入数据分析，回车
search = b.find_element_by_id('txt_search')
search.send_keys('数据分析\n')
time.sleep(3) #停顿三秒，为了加载完，也为了程序员方便查看

# 3.获取所有论文名对应的a标签
name_a_list = b.find_elements_by_css_selector('.result-table-list .fz14')
print(len(name_a_list))

# 4.点击进入详情页
for x in name_a_list:
    # print(x)

# 点击打开新的页面
    x.click()

# 切换选项卡，让浏览器对象指向第二个页面（新页面）
# b.window_handles —— 获取当前浏览器中所有的选项卡
    b.switch_to.window(b.window_handles[-1])

    time.sleep(1)
    soup = BeautifulSoup(b.page_source,'lxml')
    try:
        print(soup.select_one('.doc-top .row').text)
    except:
        print('没有摘要')
    print('---------害羞的分割线-----------')

# 关闭第二个页面的数据
    b.close()
    b.switch_to.window(b.window_handles[0])


input('end:')
b.close()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

滚动（京东为例）

from selenium.webdriver import Chrome
from bs4 import BeautifulSoup
import time

b = Chrome('chromedriver.exe')
b.get('http://www.jd.com')

b.find_element_by_id('key').send_keys('手机\n')
time.sleep(1)
for i in range(6):
    b.execute_script('window.scrollBy(0,500)') # 控制窗口滚动500像素，共滚动6次
    time.sleep(1) # 每滚动一次停一秒

soup = BeautifulSoup(b.page_source,'lxml')# 解析当前页面
goods_li = soup.select('#J_goodsList>ul>li.gl-item')# 获得当前页面的所有商品的li 标签
print(len(goods_li))#打印当前页面共有多少商品，

time.sleep(2)
b.close()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

部分滚动（西瓜视频）

1）window.scrollBy(x偏移量, y偏移量) - 让整个网页直接滚动
2）js标签对象.scrollBy(x偏移量, y偏移量) - 让指定标签中的内容滚动
js获取标签的方法：
document.getElementById(id属性值) - 获取id属性值为指定值的标签，返回标签对象
document.getElementsByClassName(class属性值) - 获取class属性值为指定值的所有标签，返回一个列表，列表中的元素是标签

from selenium.webdriver import Chrome
import time

b = Chrome()
b.get('https://www.ixigua.com/?wid_try=1')
for _ in range(10):
    time.sleep(1)
    # b.execute_script('window.scrollBy(0, 800)')
    b.execute_script("document.getElementsByClassName('v3-app-layout__side__Normal')[0].scrollBy(0, 200)")

input('end:')

b.close()
1
2
3
4
5
6
7
8
9
10
11
12
13

等待

1.隐式等待 - 只针对通过selenium获取标签的操作
设置隐式等待时间是为了让浏览器在获取标签的时候，标签不存在不会马上报错，而是在指定的时间范围内不断尝试重新获取这个标签，直到获取到标签或者超时为止（超时没取到就会报错）。
一个浏览器对象只需要设置一次隐式等待时间，它会作用于每次获取标签的操作。

1)创建浏览器对象

b = Chrome()
1

2）设置浏览器的隐式等待时间

b.implicitly_wait(5)
1

3）打开网页

b.get('https://www.jd.com')
1

2.显示等待—等到某个条件成立或者不成立才执行后续

使用方法：
1）创建等待对象: WebDriverWait(浏览器对象, 超时时间)
2）添加等待条件:
等待对象.until(条件) - 等到指定条件成立，代码才接着往后执行
等待对象.until_not(条件) - 等到指定条件不成立，代码才接着往后执行

3）常用条件：
presence_of_element_located(标签) - 当指定标签存在
visibility_of_element_located(标签) - 当指定标签可见

text_to_be_present_in_element(标签, 值) - 当指定标签的标签内容中包含指定值
text_to_be_present_in_element_value(标签, 值) - 当指定标签的value属性值包含指定值的时候

注意：标签的写法：(By.获取标签方式, 数据)
例如：(By.ID, ‘key’) - id属性值为key的标签
(By.CSS_SELECTOR, ‘#p1 a’)

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

wait = WebDriverWait(b, 30)

# 等id属性为key的标签的value属性值包含'手机'为止
wait.until(EC.text_to_be_present_in_element_value((By.ID, 'key'), '手机'))

print('=====手机出现！=======')

input('end:')
b.close()
1
2
3
4
5
6
7
8
9
10
11
12
13

基本配置(取消图片加载，页面加载更快)

from selenium.webdriver import Chrome, ChromeOptions

# 1. 创建配置对象
options = ChromeOptions()
# 1)取消测试环境
options.add_experimental_option('excludeSwitches', ['enable-automation'])
# 2)取消图片加载
options.add_experimental_option("prefs", {"profile.managed_default_content_settings.images": 2})

b = Chrome(options=options)

b.get('https://www.jd.com')

input('end:')
b.close()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

相关阅读:
swift内存绑定
 线性表的插入、删除和查询操作
 2023 CCF国际AIOps挑战赛，报名倒计时！｜截止时间9月15日
 网络编程day02（socket套接字）
JSON介绍
 【每天学习一点新知识】XXE（XML外部实体注入）从入门到放弃
 EfficientFormer：轻量化ViT Backbone
linux驱动_uart
仿闪照功能娱乐微信小程序源码下载-带外卖CPS功能和流量主
 分省/市/县最低工资标准（2012-2021年）和全国/省/市/县GDP数据（1949-2020年）
原文地址：https://blog.csdn.net/qq_40752621/article/details/125452017