regex - 如何在硒定位器中使用正则表达式

Question

我正在使用 selenium RC，例如，我想获取所有具有匹配属性 href 的链接元素：

http://[^/]*\d+com

我想使用：

sel.get_attribute( '//a[regx:match(@href, "http://[^/]*\d+.com")]/@name' )

这将返回与正则表达式匹配的所有链接的名称属性列表。（或类似的东西）

谢谢

score 13 · Accepted Answer

上面的答案可能是找到与正则表达式匹配的所有链接的正确方法，但我认为回答问题的另一部分也很有帮助，即如何在 Xpath 定位器中使用正则表达式。您需要使用正则表达式 match() 函数，如下所示：

xpath=//div[matches(@id,'che.*boxes')]

（当然，这会单击带有“id=checkboxes”或“id=cheANYTHINGHEREboxes”的 div）

但请注意，并非所有 Xpath 的本机浏览器实现都支持 match 函数（最明显的是，在 FF3 中使用它会引发错误：invalid xpath[2]）。

如果您的特定浏览器有问题（就像我对 FF3 所做的那样），请尝试使用 Selenium 的 allowNativeXpath("false") 切换到 JavaScript Xpath 解释器。它会更慢，但它似乎可以与更多的 Xpath 函数一起使用，包括“匹配”和“结束”。:)

score 3 · Accepted Answer

您可以使用 Selenium 命令 getAllLinks 来获取页面上链接 id 的数组，然后您可以循环并使用 getAttribute 检查 href，它采用定位符后跟 @ 和属性名称。例如在 Java 中，这可能是：

String[] allLinks = session().getAllLinks();
List<String> matchingLinks = new ArrayList<String>();

for (String linkId : allLinks) {
    String linkHref = selenium.getAttribute("id=" + linkId + "@href");
    if (linkHref.matches("http://[^/]*\\d+.com")) {
        matchingLinks.add(link);
    }
}

score 2 · Accepted Answer

一个可能的解决方案是使用sel.get_eval()和编写一个返回链接列表的 JS 脚本。类似于以下答案： selenium: Is it possible to use the regexp in selenium locators

score 0 · Accepted Answer

这是 Selenium RC 的一些替代方法。这些不是纯 Selenium 解决方案，它们允许与您的编程语言数据结构和 Selenium 进行交互。

您还可以获取 HTML 页面源，然后正则表达式源返回一组匹配的链接。使用正则表达式分组来分离 URL、链接文本/ID 等，然后您可以将它们传递回 selenium 以单击或导航到。

另一种方法是获取父/根元素的 HTML 页面源或 innerHTML（通过 DOM 定位器），然后将 HTML 转换为 XML 作为编程语言中的 DOM 对象。然后，您可以使用所需的 XPath（使用或不使用正则表达式）遍历 DOM，并获得仅包含感兴趣链接的节点集。从他们解析出链接文本/ID 或 URL，您可以传回 selenium 以单击或导航到。

根据要求，我在下面提供示例。它是混合语言，因为无论如何该帖子似乎都不是特定语言的。我只是用我可以用来破解的东西作为例子。它们根本没有经过全面测试或测试，但我之前在其他项目中使用过一些代码，所以这些是概念验证代码示例，说明您将如何实施我刚才提到的解决方案。

//Example of element attribute processing by page source and regex (in PHP)
$pgSrc = $sel->getPageSource();
//simple hyperlink extraction via regex below, replace with better regex pattern as desired
preg_match_all("/<a.+href=\"(.+)\"/",$pgSrc,$matches,PREG_PATTERN_ORDER);
//$matches is a 2D array, $matches[0] is array of whole string matched, $matches[1] is array of what's in parenthesis
//you either get an array of all matched link URL values in parenthesis capture group or an empty array
$links = count($matches) >= 2 ? $matches[1] : array();
//now do as you wish, iterating over all link URLs
//NOTE: these are URLs only, not actual hyperlink elements

//Example of XML DOM parsing with Selenium RC (in Java)
String locator = "id=someElement";
String htmlSrcSubset = sel.getEval("this.browserbot.findElement(\""+locator+"\").innerHTML");
//using JSoup XML parser library for Java, see jsoup.org
Document doc = Jsoup.parse(htmlSrcSubset);
/* once you have this document object, can then manipulate & traverse
it as an XML/HTML node tree. I'm not going to go into details on this
as you'd need to know XML DOM traversal and XPath (not just for finding locators).
But this tutorial URL will give you some ideas:

http://jsoup.org/cookbook/extracting-data/dom-navigation

the example there seems to indicate first getting the element/node defined
by content tag within the "document" or source, then from there get all
hyperlink elements/nodes and then traverse that as a list/array, doing
whatever you want with an object oriented approach for each element in
the array. Each element is an XML node with properties. If you study it,
you'd find this approach gives you the power/access that WebDriver/Selenium 2
now gives you with WebElements but the example here is what you can do in
Selenium RC to get similar WebElement kind of capability
*/

score 0 · Accepted Answer

Selenium 的 By.Id 和 By.CssSelector 方法不支持 Regex，而 By.XPath 仅在启用 XPath 2.0 的情况下才支持。如果你想使用正则表达式，你可以这样做：

void MyCallingMethod(IWebDriver driver)
{
    //Search by ID:
    string attrName = "id";
    //Regex = 'a number that is 1-10 digits long'
    string attrRegex= "[0-9]{1,10}";
    SearchByAttribute(driver, attrName, attrRegex);
}
IEnumerable<IWebElement> SearchByAttribute(IWebDriver driver, string attrName, string attrRegex)
{    
     List<IWebElement> elements = new List<IWebElement>();

     //Allows spaces around equal sign. Ex: id = 55
     string searchString = attrName +"\\s*=\\s*\"" + attrRegex +"\"";
     //Search page source
     MatchCollection matches = Regex.Matches(driver.PageSource, searchString, RegexOptions.IgnoreCase);
    //iterate over matches
    foreach (Match match in matches)
    {
        //Get exact attribute value
        Match innerMatch = Regex.Match(match.Value, attrRegex);
        cssSelector = "[" + attrName + "=" + attrRegex + "]";
       //Find element by exact attribute value
       elements.Add(driver.FindElement(By.CssSelector(cssSelector)));
   }

   return elements;
}

注意：此代码未经测试。此外，您可以通过找出消除第二次搜索的方法来优化此方法。

regex - 如何在硒定位器中使用正则表达式

5 回答 5

Related

Reference