1

我正在尝试提出一个正则表达式,该表达式将指示提供的 url 是否是网站的索引页面。这意味着它必须匹配 domain.com、domain.com/ 和 domain.com/index.php 而不是 domain.com/page.php

这是我想出的用于测试的列表。由于 www/nonwww、http/https、斜杠等导致的排列如此之多。

它应该匹配这些:

它不应该匹配这些

(还有其他我遗漏的组合吗??)

到目前为止,我想出的是:

site.com(/|index.php|)

这显然是不正确的,因为它也匹配 /page 值。

4

2 回答 2

7

有效

^https?://[^/]+(/(\?.*|index\.php(\?.*)?)?)?$

请注意,这是一个通用的正则表达式。为了符合您的口味,您可能需要逃跑。

在这里运行一个简单的测试后,egrep结果是

$ while read x 
>       do 
>           if  echo $x | egrep '^https?://[^/]+(/(\?.*|index\.php(\?.*)?)?)?$' > /dev/null
>           then  
>               echo MATCH $x
>           else 
>               echo NOT MATCH $x 
>           fi
>       done < data
MATCH http://site.com/index.php
MATCH http://site.com/
MATCH http://site.com
MATCH http://site.com/index.php?var=X
MATCH http://site.com/?var=X
MATCH http://site.com?var=X
MATCH https://site.com/index.php
MATCH https://site.com/
MATCH https://site.com
MATCH https://site.com/index.php?var=X
MATCH https://site.com/?var=X
MATCH https://site.com?var=X
MATCH http://www.site.com/index.php
MATCH http://www.site.com/
MATCH http://www.site.com
MATCH http://www.site.com/index.php?var=X
MATCH http://www.site.com/?var=X
MATCH http://www.site.com?var=X
MATCH https://www.site.com/index.php
MATCH https://www.site.com/
MATCH https://www.site.com
MATCH https://www.site.com/index.php?var=X
MATCH https://www.site.com/?var=X
MATCH https://www.site.com?var=X
NOT MATCH http://site.com/page.php
NOT MATCH http://site.com/page.php?var=X
NOT MATCH http://site.com/page
NOT MATCH http://site.com/page/
NOT MATCH http://site.com/page/index.php
NOT MATCH http://site.com/page?var=X
NOT MATCH http://site.com/page/?var=X
NOT MATCH https://site.com/page.php
NOT MATCH https://site.com/page.php?var=X
NOT MATCH https://site.com/page
NOT MATCH https://site.com/page/
NOT MATCH https://site.com/page/index.php
NOT MATCH https://site.com/page?var=X
NOT MATCH https://site.com/page/?var=X
NOT MATCH http://www.site.com/page.php
NOT MATCH http://www.site.com/page.php?var=X
NOT MATCH http://www.site.com/page
NOT MATCH http://www.site.com/page/
NOT MATCH http://www.site.com/page/index.php
NOT MATCH http://www.site.com/page?var=X
NOT MATCH http://www.site.com/page/?var=X
NOT MATCH https://www.site.com/page.php
NOT MATCH https://www.site.com/page.php?var=X
NOT MATCH https://www.site.com/page
NOT MATCH https://www.site.com/page/
NOT MATCH https://www.site.com/page/index.php
NOT MATCH https://www.site.com/page?var=X
NOT MATCH https://www.site.com/page/?var=X
于 2012-12-18T19:53:13.750 回答
0

假设您在 PHP 中执行此操作。您应该使用parse_url()(http://php.net/manual/en/function.parse-url.php) 然后查看路径元素。

<?php
$url = "http://example.com/index.php?page=1";
$path = parse_url($url, PHP_URL_PATH);
print "path=$path\n";
?>

运行它,你得到

path=/index.php

一旦你有了 in 中的路径$path,就只是匹配到/or/index.php或其他什么的问题。不需要正则表达式。

于 2012-12-18T21:12:41.927 回答