Selenium选择器小结

news/2024/7/19 8:57:15 标签: selenium, python, 测试工具, 爬虫

前言

最近在一些网上采集数据,目前大部分网站的数据都是动态获取的,例如最常见的通过下拉滚动条刷新列表数据。这就让传统的Scrapy爬虫工具无能为力了,虽然有Selemium, Playwright等插件对Scrapy的加持,但这些插件目前都不太完善,所以,我基本放弃了一切用Scrapy搞掂的想法。静态网站的采集使用Scrapy,动态网站则完全采用Selenium或puppeteer或Playwright。
经过对三者的试用,基本结论就是:

大部分动态数据的爬取采用Selenium的完全没问题的,Selenium对Python和其它语言的支持胜于puppeteer和Playwright,Selenium总体要成熟稳定一点。文档也更丰富。网上各种问题容易找到答案。

当然上述结论或许半年后就不成立了。因为后两者的发展很快。言归正传,掌握了Selenium的选择器,就掌握了Selenium的一半。尤其是CSS 选择器,简明好用,是首选。

几种典型的选择方式

ID选择

如果元素有ID,则优先采用ID定位。

XPath: //div[@id='example'] 
CSS: #example

根据Element类型选择

Xpath: //input
Css: =input

直接子元素

XPATH 采用斜杠 “/“定义, CSS选择器采用 “>”定义。

例子:

XPath: //div/a
CSS: div > a

非直接子元素

XPATH采用双斜杠 “//”,CSS采用空格。例子:

XPath: //div//a
CSS: div a

根据class类名选择

XPATH: “[@class=‘example’]”
CSS 选择器就是一个点号“.”

XPath: //div[@class='example']
CSS: .example

根据元素的文本选择

XPATH: //[ text() = ‘Get started free’ ]
XPATH: //
[ contains (text(), ‘Get started’ ) ]
CSS: <:><(text)>

CSS 选择器高级用法

Next Sibling 兄弟节点

This is useful for navigating lists of elements, such as forms or ul items. The next sibling will tell selenium to find the next adjacent element on the page that’s inside the same parent. Let’s show an example using a form to select the field after username.

Login

Let’s write an XPath and css selector that will choose the input field after “username”. This will select the “alias” input, or will select a different element if the form is reordered.

XPATH: //input[@id=‘username’]/following-sibling:input[1]
CSS: #username + input
Attribute Values
If you don’t care about the ordering of child elements, you can use an attribute selector in selenium to choose elements based on any attribute value. A good example would be choosing the ‘username’ element of the form above without adding a class.

We can easily select the username element without adding a class or an id to the element.

XPATH: //input[@name=‘username’]
CSS: input[name=‘username’]
We can even chain filters to be more specific with our selectors.

XPATH: //input[@name='login’and @type=‘submit’]
CSS: input[name=‘login’][type=‘submit’]
Here Selenium will act on the input field with name=“login” and type=“submit”

指定特殊匹配: nth-child 和 nth-of-type

CSS selectors in Selenium allow us to navigate lists with more finesse than the above methods. If we have a ul and we want to select its fourth li element without regard to any other elements, we should use nth-child or nth-of-type. Nth-child is a pseudo-class. In straight CSS, that allows you to override behavior of certain elements; we can also use it to select those elements.

<ul id = "recordlist">
<li>Cat</li>
<li>Dog</li>
<li>Car</li>
<li>Goat</li>
</ul>

If we want to select the fourth li element (Goat) in this list, we can use the nth-of-type, which will find the fourth li in the list. Notice the two colons, a recent change to how CSS identifies pseudo-classes.

CSS: #recordlist li::nth-of-type(4)
On the other hand, if we want to get the fourth element only if it is a li element, we can use a filtered nth-child which will select (Car) in this case.

CSS: #recordlist li::nth-child(4)
Note, if you don’t specify a child type for nth-child it will allow you to select the fourth child without regard to type. This may be useful in testing css layout in selenium.

CSS: #recordlist *::nth-child(4)
In XPATH this would be similar to using [4].

子串匹配

CSS 选择器的一大特色就是字符串的匹配, 可以采用 ^=, $=, 或 *= 。

^= 匹配前缀
CSS: a[id^=‘id_prefix_’]
A link with an “id” that starts with the text “id_prefix_”

= 匹 配 后 缀 C S S : a [ i d = 匹配后缀 CSS: a[id =CSS:a[id=‘_id_sufix’]
A link with an “id” that ends with the text “_id_sufix”

= 匹配子串
CSS: a[id
=‘id_pattern’]
A link with an “id” that contains the text “id_pattern”

总结

本文对selenium选择器的基本用法做了一一介绍,帮助大家掌握这一强大的自动化测试工具。当然,你搞自动化运维,做爬虫,没人拦着你。


http://www.niftyadmin.cn/n/10766.html

相关文章

Servlet常用API

目录 一、HttpServlet 1、HttpServlet核心方法 2、Servlet的生命周期 二、HttpServletRequest 1、HttpServletRequest核心方法 2、代码示例 示例1&#xff1a;打印请求信息 示例2&#xff1a;获取GET请求中的query string 示例3&#xff1a;获取POST请求中的query str…

element UI 组件封装--搜索表单(含插槽和内嵌组件)

组件封装–搜索表单 searchForm.vue 可根据需要&#xff0c;参考姓名和工作自行增加更多常用的默认搜索项 <template><div style"padding: 30px; width: 300px"><el-formref"searchFormRef":model"searchData":label-width"…

Linux应用程序和驱动程序接口

现在我们已经知道了&#xff0c;我们为什么写驱动程序 是因为在linux里面&#xff0c;我们不能直接访问硬件 那我们如何写驱动程序呢&#xff1f; 驱动程序有很多 LED,KEY,CAMERA&#xff0c;我们总不能针对每种驱动程序都设置接口 APP只能调用标准接口&#xff0c;open&am…

【微信小程序】列表渲染wx:for

&#x1f3c6;今日学习目标&#xff1a;第十二期——列表渲染wx:for &#x1f603;创作者&#xff1a;颜颜yan_ ✨个人主页&#xff1a;颜颜yan_的个人主页 ⏰预计时间&#xff1a;20分钟 &#x1f389;专栏系列&#xff1a;我的第一个微信小程序 文章目录前言效果图< block…

【实战案例】——实战渗透某不法网站

作者名&#xff1a;Demo不是emo 主页面链接&#xff1a;主页传送门 创作初心&#xff1a;舞台再大&#xff0c;你不上台&#xff0c;永远是观众&#xff0c;没人会关心你努不努力&#xff0c;摔的痛不痛&#xff0c;他们只会看你最后站在什么位置&#xff0c;然后羡慕或鄙夷座…

【笔试强训】Day1

&#x1f308;欢迎来到笔试强训专栏 (꒪ꇴ꒪(꒪ꇴ꒪ )&#x1f423;,我是Scort目前状态&#xff1a;大三非科班啃C中&#x1f30d;博客主页&#xff1a;张小姐的猫~江湖背景快上车&#x1f698;&#xff0c;握好方向盘跟我有一起打天下嘞&#xff01;送给自己的一句鸡汤&#x…

迅为3A5000_7A2000开发板龙芯全国产处理器LoongArch架构核心方案

1.全国产设计方案 从里到外 100% 全国产 从CPU自主指令系统到开发板每一个元器件&#xff0c;做到100%全国产化。 2.产品开发更快捷 PCIE 32路 相比同类嵌入式板卡仅2到4路的PCIE&#xff0c; 这款核心板可以支持多达32路的PCIE 3.0接口 3.工业化标准设计 遵循COM E…

第五届“传智杯”全国大学生计算机大赛(练习赛)题解

A - T292219 [传智杯 #5 练习赛] 复读 题目链接 知识点&#xff1a;字符串 题意 给你多个字符串&#xff0c;字符串为0表示所有的字符串都已读完&#xff0c;并且0不被认为是一个非复读字符串。如果后面的字符串与前面的字符串相同&#xff0c;则被当做是复读字符串&#xf…