python - 正则表达式：匹配以任何内容开头的字符串，然后是连字符

Question

假设我有以下文本：

BBC - 这是文本

我将如何使用正则表达式来测试字符串是否以开头"* - "？

然后删除"* - "，只剩下"Here is the text"。（我正在使用python）。

我使用"*"它是因为它显然不会"BBC - "每次都开始，它可能是其他一些子字符串。

这行得通吗？

"^.* - "

非常感谢。

回答：

m = re.search(ur'^(.*? [-\xe2\u2014] )?(.*)', text)

这行得通。谢谢@xanatos！

score 2 · Accepted Answer

这是“匹配第一个连字符和连字符本身之前的所有内容”模式：

/^[^-]*-\s*/

内容如下：

^      - starting from the beginning of the string...
[^-]*  - match any number (including zero) of non-hyphens, then...
-      - match hyphen itself, then...
\s*    - match any number (including zero) of whitespace

然后你可以用空字符串替换模式匹配的字符串：替换的 resulf 可能是你整体需要的。)

score 1 · Accepted Answer

试试这段代码：

str = u"BBC \xe2 abc - Here is the text"
m = re.search(ur'^(.*? [-\xe2] )?(.*)', str, re.UNICODE)

# or equivalent
# m = re.match(ur'(.*? [-\xe2] )?(.*)', str, re.UNICODE)

# You don't really need re.UNICODE, but if you want to use unicode
# characters, it's better you conside à to be a letter :-) , so re.UNICODE

# group(1) contains the part before the hypen
if m.group(1) is not None:
    print m.group(1)

# group(2) contains the part after the hypen or all the string 
# if there is no hypen
print m.group(2)

正则表达式的解释：

^ is the beginning of the string (the match method always use the beginning
  of the string)
(...) creates a capturing group (something that will go in group(...)
(...)? is an optional group
[-\xe2] one character between - and \xe2 (you can put any number of characters
        in the [], like [abc] means a or b or c
.*? [-\xe2] (there is a space after the ]) any character followed by a space, an hypen and a space
      the *? means that the * is "lazy" so it will try to catch only the
      minimum number possible of characters, so ABC - DEF - GHI
      .* - would catch ABC - DEF -, while .* - will catch ABC - 

so

(.* [-\xe2] )? the string could start with any character followed by an hypen
         if yes, put it in group(1), if no group(1) will be None
(.*) and it will be followed by any character. You dont need the 
     $ (that is the end-of the string, opposite of ^) because * will 
     always eat all the characters it can eat (it's an eager operator)

score 0 · Accepted Answer

/^.+-/应该管用。

以下是根据您的要求的测试用例：

通过：foo -

通过：bar-

通过：-baz-

通过：*qux-

通过：-------------

失败：****

失败：-foobar

score 0 · Accepted Answer

使用?- 运算符：

'^(.+ [-] )?(.+)$'

也许你想对空格更灵活地实现它......

一些简单粗暴的测试脚本（使用 php 而不是 python，对不起！）：

<?php
$string  = "BBC - This is the text.";
$pattern = '/^(.+ [-] )?(.+)$/';
preg_match($pattern, $string, $tokens);
var_dump($tokens);
?>

测试脚本的输出：

array(3) {
  [0] =>
  string(23) "BBC - This is the text."
  [1] =>
  string(6) "BBC - "
  [2] =>
  string(17) "This is the text."
}

第一个括号匹配字符串开头的任何文本，该文本以长度>0 的任何字符开头，后跟一个空格字符，然后是文字连字符和另一个空格字符。该序列可能存在也可能不存在。第二个括号匹配字符串的所有其余部分，直到最后。

python - 正则表达式：匹配以任何内容开头的字符串，然后是连字符

4 回答 4

Related

Reference