引自免费微信小程序:皆我百晓生
从你提供的代码来看,主要的问题在于滚动到页面底部并截取剩余部分的处理方式。在循环结束后,你尝试再次滚动到特定位置并截取剩余部分,但这部分代码可能没有正确地处理页面滚动后的等待时间,导致截图未能准确捕捉到目标区域。
为了解决这个问题,可以优化long_sc函数中的滚动逻辑,确保每次滚动后都有足够的时间让页面完全加载,同时也应调整最后一张截图的获取方式,确保它捕捉的是实际可见的内容。以下是修改后的代码:
import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from PIL import Image
import time
import math
fold_path = "D://Desktop//news//"
screenshot_name = "news"
sc_hight = 500 # Set this to the desired height of each screenshot segment
def short_sc(el, driver):
start_height = el.location["y"]
js = "window.scrollTo(0, %s);" % (start_height)
driver.execute_script(js)
time.sleep(1) # Wait for the page to settle after scrolling
fp = fold_path + f'{screenshot_name}.png'
driver.save_screenshot(fp)
img = Image.open(fp)
img2 = img.crop((el.location["x"], start_height, el.location["x"] + el.size["width"], start_height + el.size["height"]))
img2.save(fp)
def long_sc(el, driver):
total_height = el.size["height"]
segments = math.ceil(total_height / sc_hight)
short_screen_list = []
for i in range(segments):
scroll_to = min(i * sc_hight, total_height)
driver.execute_script(f"window.scrollTo(0, {scroll_to});")
time.sleep(1) # Wait for the page to settle after scrolling
fp = fold_path + f'{screenshot_name}_{i}.png'
driver.save_screenshot(fp)
img = Image.open(fp)
box = (0, scroll_to, el.size["width"], min(scroll_to + sc_hight, total_height))
img2 = img.crop(box)
img2.save(fp)
short_screen_list.append(img2)
new_img = Image.new("RGB", (el.size["width"], total_height))
current_height = 0
for img in short_screen_list:
new_img.paste(img, (0, current_height))
current_height += img.height
new_img.save(fold_path + f'full.png')
if __name__ == "__main__":
driver = uc.Chrome(headless=False, use_subprocess=False)
driver.get("https://news.baidu.com/")
driver.maximize_window()
el = driver.find_element(By.TAG_NAME, "body")
if el.size["height"] > sc_hight:
long_sc(el, driver)
else:
short_sc(el, driver)
在这个版本中,long_sc函数被修改为直接将裁剪后的图像添加到列表中,而不是先保存再读取。这样可以减少磁盘I/O操作,提高效率。同时,滚动后的等待时间被增加到1秒,以确保页面元素完全加载。在拼接最终图像时,也进行了相应的调整以确保图像正确拼接。
此外,short_sc函数中的截图裁剪逻辑也被调整,以确保只截取body元素的高度范围内的内容,而不仅仅是整个屏幕的高度。这有助于确保即使在页面高度小于屏幕高度的情况下也能正确截图。
