• 【教程】多进程下载百度旋转验证码图片-制作数据集


    转载请注明出处:小锋学长生活大爆炸[xfxuezhang.cn]

    数据集制作辅助工具:【工具】旋转图片-数据集制作工具, 开源!

    效果展示:

    直接上代码,开箱即用(当然selenium库自己装一下):

    1. import os
    2. import time
    3. import requests
    4. from selenium import webdriver
    5. from selenium.webdriver.support.wait import WebDriverWait
    6. from selenium.webdriver import ActionChains
    7. from webdriver_manager.chrome import ChromeDriverManager
    8. from selenium import webdriver
    9. from selenium.webdriver.chrome.service import Service as ChromeService
    10. from selenium.webdriver.common.by import By
    11. from multiprocessing import Process
    12. # 根据链接下载旋转图片
    13. def get_img(url):
    14. header = {
    15. "Host": "passport.baidu.com",
    16. "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:93.0) Gecko/20100101 Firefox/93.0",
    17. "Accept": "image/avif,image/webp,*/*",
    18. "Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
    19. "Accept-Encoding": "gzip, deflate, br",
    20. "Referer": "https://wappass.baidu.com/",
    21. "Connection": "keep-alive",
    22. "Cookie": 'Hm_lvt_3eecc7feff77952670b7c24e952e8773=1666849322,1666919008,1666961940,1667175865; Hm_lpvt_3eecc7feff77952670b7c24e952e8773=1667186488; token="MTY2NzE4NzczNS4yMTEzMjg1OmQwNDNhNmZiZTA4MjlmOGY1YjE0MjA0NmViN2M1NTdkM2MyYWY3NzE="; sessionid=aa6zibdmfbs5cwzh6x62niw7fbqe5pon',
    23. "Sec-Fetch-Dest": "image",
    24. "Sec-Fetch-Mode": "no-cors",
    25. "Sec-Fetch-Site": "same-site",
    26. "Pragma": "no-cache",
    27. "Cache-Control": "no-cache",
    28. }
    29. response = requests.get(url=url,headers=header)
    30. if response.status_code == 200:
    31. with open("images/"+str(int(time.time()))+".jpg", 'wb') as f:
    32. f.write(response.content)
    33. def main():
    34. driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
    35. driver.implicitly_wait(5)
    36. while True:
    37. # 访问百度首页
    38. driver.get('https://wappass.baidu.com/static/captcha/tuxing.html?&ak=c27bbc89afca0463650ac9bde68ebe06&backurl=https%3A%2F%2Fwww.baidu.com%2Fs%3Fcl%3D3%26tn%3Dbaidutop10%26fr%3Dtop1000%26wd%3D%25E6%25B6%2588%25E9%2598%25B2%25E6%2588%2598%25E5%25A3%25AB%25E8%25BF%259E%25E5%25A4%259C%25E7%25AD%2591%25E5%259D%259D%25E5%25BA%2594%25E5%25AF%25B9%25E6%25B4%25AA%25E5%25B3%25B0%25E8%25BF%2587%25E5%25A2%2583%26rsv_idx%3D2%26rsv_dl%3Dfyb_n_homepage%26hisfilter%3D1&logid=8309940529500911554&signature=4bce59041938b160b7c24423bde0b518×tamp=1624535702')
    39. # 等待滑块出现
    40. WebDriverWait(driver, 10).until(lambda x: x.find_element(By.XPATH, value='//div[@class="passMod_slide-btn "]'))
    41. time.sleep(1)
    42. # 等待验证码出现
    43. WebDriverWait(driver, 10).until(lambda x: x.find_element(By.XPATH, value='//img[@class="passMod_spin-background"]'))
    44. img_src = driver.find_element(By.XPATH, value='//img[@class="passMod_spin-background"]').get_attribute('src')
    45. # 下载图片
    46. get_img(img_src)
    47. if __name__ == '__main__':
    48. # 多进程下载百度旋转验证码图片
    49. if not os.path.exists('images'):
    50. os.mkdir('images')
    51. for i in range(5):
    52. print(f'进程{i}启动')
    53. p = Process(target=main, name=f"work_{i}")
    54. p.start()

  • 相关阅读:
    java计算机毕业设计校园共享单车系统源程序+mysql+系统+lw文档+远程调试
    长安链上线可视化敏捷测试工具v1.0版本
    网络基础之路由详解
    [iOS]-NSTimer与循环引用的理解
    React Native 入门(三)——js与native互相通信
    uniapp-vue3-微信小程序-标签选择器wo-tag
    今天面了个腾讯拿28K出来的,让我见识到了开发基础的天花板.......
    SSM框架速成2——Spring5速成总结
    Docker——入门实战
    @企业主们看过来,用华为云CDN给你的网页加个速
  • 原文地址:https://blog.csdn.net/sxf1061700625/article/details/134279981