2

/我需要在除and之外的任何非字母数字字符上拆分字符串-。例如,在preg_split()

/[^a-zA-Z0-9\/\-]/

这很好用,但现在我想在所有这些点拆分字符串,除非在 URL 中找到字符(即我想将 URL 保持在一起)。我认为 URL 是一个以空格分隔的子字符串,以http://or开头https://。换句话说:

My string. https://my-url.com?q=3 More strings.

应该分成:

[0] My
[1] string
[2] https://my-url.com?q=3
[3] More
[4] strings

我尝试了一些天真的方法,/[^a-zA-Z0-9\/\-(https?\:\/\/.\s)]+/但不幸的是,我不知道如何在字符类之外执行此操作,这显然没有给我想要的结果。

我现在正在使用 PHP,我希望只是使用preg_split(),但我愿意接受比这更好、更全面的方法。

4

1 回答 1

2

You can't just stuff things into the character class. Everything will be treated as single characters. What you would want is a negative lookbehind, that ensures, there is no https?:// before your match (separated only by non-whitespace characters). But only .NET supports variable-length lookbehinds. You could reverse the input and pattern and result to work around this, but that's a bit over kill. Just go from splitting to matching:

preg_match_all('~https?://\S*|[a-zA-Z0-9/-]+~', $input, $matches);

Now $matches[0] will contain your desired array.

Working demo.

Note that you can change the delimiter to pretty much anything. This comes in handy, if you have loads of forward slashes, so you don't have to escape them. You also don't need to escape the hyphen if it's the last character in a character class, but in that case whether you do or not is rather a matter of taste.

于 2013-04-17T19:01:58.487 回答