• Selenium的安装


    一、安装selenium ,

    pip install -U selenium 

    二、安装chromedriver

    http://chromedriver.storage.googleapis.com/index.html

    三、chromedriver放在python的安装根目录下面即可,为什么放到python安装的根目录下即可呢,是因为WebDriver的初始化代码里,init,有这个注释

    - executable_path - Deprecated: path to the executable. If the default is used it assumes the executable is in the $PATH

                             

     实现思路

    selenium自动化代码-》XXXdriver.exe-》 浏览器(ie、chrome、firefox)

    通过http进行通信的,客户端是python代码或者java代码,服务端是xxxdriver

    通信流程:

    1、xxxdriver启动,ip+端口监听中

    2、selenium webdriver跟xxxdriver建立连接,然后发送http请求

    3、xxxdriver收到指令后,驱动浏览器

    4、xxxxdriver要把结果返回给selenium webdriver

    5、继续发下一个http请求

    6、断开连接,关闭驱动服务、关闭浏览器

    写一个简单的例子,可以跟一下源码,可以发现原理:是一个http请求,协议是json格式,

    本质上来讲把每一个对网页的操作,都是一个接口,json格式、url、请求类型、请求数据,协议名称jsonwireprotocol

    1. from selenium import webdriver
    2. # 打开浏览器,与浏览器建立会话
    3. # 启动chromedriver.exe,并且建立连接,会话ID
    4. driver = webdriver.Chrome()
    5. driver.get("https://www.baidu.com")

    1)点击get方法

     2)然后点击execute方法,主要看response,调用了execute方法

    1. def execute(self, driver_command: str, params: dict = None) -> dict:
    2. """
    3. Sends a command to be executed by a command.CommandExecutor.
    4. :Args:
    5. - driver_command: The name of the command to execute as a string.
    6. - params: A dictionary of named parameters to send with the command.
    7. :Returns:
    8. The command's JSON response loaded into a dictionary object.
    9. """
    10. if self.session_id:
    11. if not params:
    12. params = {'sessionId': self.session_id}
    13. elif 'sessionId' not in params:
    14. params['sessionId'] = self.session_id
    15. params = self._wrap_value(params)
    16. response = self.command_executor.execute(driver_command, params)
    17. if response:
    18. self.error_handler.check_response(response)
    19. response['value'] = self._unwrap_value(
    20. response.get('value', None))
    21. return response
    22. # If the server doesn't send a response, assume the command was
    23. # a success
    24. return {'success': 0, 'value': None, 'sessionId': self.session_id}

    3)继续点击execute()方法,可以看到最后调用的是request方法

    1. def execute(self, command, params):
    2. """
    3. Send a command to the remote server.
    4. Any path substitutions required for the URL mapped to the command should be
    5. included in the command parameters.
    6. :Args:
    7. - command - A string specifying the command to execute.
    8. - params - A dictionary of named parameters to send with the command as
    9. its JSON payload.
    10. """
    11. command_info = self._commands[command]
    12. assert command_info is not None, 'Unrecognised command %s' % command
    13. path = string.Template(command_info[1]).substitute(params)
    14. if isinstance(params, dict) and 'sessionId' in params:
    15. del params['sessionId']
    16. data = utils.dump_json(params)
    17. url = f"{self._url}{path}"
    18. return self._request(command_info[0], url, body=data)

    4)点击request方法,可以看到其实就是发起了一个http请求,只要开始我们把参数传对,就会发送正确的http请求。

    1. def _request(self, method, url, body=None):
    2. """
    3. Send an HTTP request to the remote server.
    4. :Args:
    5. - method - A string for the HTTP method to send the request with.
    6. - url - A string for the URL to send the request to.
    7. - body - A string for request body. Ignored unless method is POST or PUT.
    8. :Returns:
    9. A dictionary with the server's parsed JSON response.
    10. """
    11. LOGGER.debug(f"{method} {url} {body}")
    12. parsed_url = parse.urlparse(url)
    13. headers = self.get_remote_connection_headers(parsed_url, self.keep_alive)
    14. response = None
    15. if body and method not in ("POST", "PUT"):
    16. body = None
    17. if self.keep_alive:
    18. response = self._conn.request(method, url, body=body, headers=headers)
    19. statuscode = response.status
    20. else:
    21. conn = self._get_connection_manager()
    22. with conn as http:
    23. response = http.request(method, url, body=body, headers=headers)
    24. statuscode = response.status
    25. if not hasattr(response, 'getheader'):
    26. if hasattr(response.headers, 'getheader'):
    27. response.getheader = lambda x: response.headers.getheader(x)
    28. elif hasattr(response.headers, 'get'):
    29. response.getheader = lambda x: response.headers.get(x)
    30. data = response.data.decode('UTF-8')
    31. LOGGER.debug(f"Remote response: status={response.status} | data={data} | headers={response.headers}")
    32. try:
    33. if 300 <= statuscode < 304:
    34. return self._request('GET', response.getheader('location'))
    35. if 399 < statuscode <= 500:
    36. return {'status': statuscode, 'value': data}
    37. content_type = []
    38. if response.getheader('Content-Type'):
    39. content_type = response.getheader('Content-Type').split(';')
    40. if not any([x.startswith('image/png') for x in content_type]):
    41. try:
    42. data = utils.load_json(data.strip())
    43. except ValueError:
    44. if 199 < statuscode < 300:
    45. status = ErrorCode.SUCCESS
    46. else:
    47. status = ErrorCode.UNKNOWN_ERROR
    48. return {'status': status, 'value': data.strip()}
    49. # Some drivers incorrectly return a response
    50. # with no 'value' field when they should return null.
    51. if 'value' not in data:
    52. data['value'] = None
    53. return data
    54. else:
    55. data = {'status': 0, 'value': data}
    56. return data
    57. finally:
    58. LOGGER.debug("Finished Request")
    59. response.close()

    1、退出会话,关闭浏览器,关闭chromedriver

    driver.quit(),这个退出

    driver.close(),关闭当前的窗口

  • 相关阅读:
    sqlserver 查询数据显示行号
    【C++】缺省参数 函数重载 内联函数
    Elasticsearch:Ingest pipeline 介绍
    如何在 Ubuntu 上安装 EMQX MQTT 服务器
    SDS-redis动态字符串
    SwiftUI 动态岛开发教程之 04 如何设计动态岛Dynamic Island 的现场活动 Live Activity
    布隆过滤器
    Kotlin与Java写法的变更
    Oracle数据库基础
    搭建强化学习的机械臂MuJoCo环境以及urdf转xml文件方法
  • 原文地址:https://blog.csdn.net/qq_30353203/article/details/126131415