1

嗨,我是正则表达式的新手,我正在尝试使用它来捕获\s{2,}垃圾中的空格,但包括"url":"https://x.com/a/C25/XPS - Connection - May 2013.docx". 目前,我有一个 url 尚未编码的场景,因此它可能包含空格。

示例文本:

"startofjunk      junkjunkjunkjunk","url":"https://x.com/a/C25/XPS  - Connection - May 2013.docx","contentsource":"AX","returpath":null,"detailpath":"https://ax.sample.com/Rep>ositories/form.aspx?path=C25/96/99&mode=Read","detailspath2":"samplepath"

所需文字:

"startofjunk junkjunkjunkjunk","url":"https://x.com/a/C25/XPS  - Connection - May 2013.docx","contentsource":"AX","returpath":null,"detailpath":"https://ax.sample.com/Rep>ositories/form.aspx?path=C25/96/99&mode=Read","detailspath2":"samplepath"

请帮忙。谢谢

4

2 回答 2

0

使用前瞻来断言您的空格出现“url”之前。还使用后视功能,因此您的整个匹配是多余的空格:

(?<=\s)\s+(?=.*"url":)

要删除多余的空格,请将整个匹配项替换为空白(即没有),或者如果您的应用程序语言允许,则删除整个匹配项。

于 2013-06-04T12:30:19.570 回答
0

描述

此正则表达式将找到一个用单个空格替换所有多个空格,并将绕过 url 部分。在 X 个空格的序列中,第一个空格被放置在第 1 组中,该组被馈送到输出 as\1并且其他空格被忽略。URL 部分被绕过,因为如果它作为 or 语句的一部分遇到,那么它会被捕获到组 2 中,然后通过替换|将其注入回输出中。\2

正则表达式:(\s)\s*|("url":"[^"]*"),替换为:\1\2

在此处输入图像描述

源字符串

"startofjunk        junkjunkjunkjunk","url":"https://x.com/a/C25/XPS - Connection - May 2013.docx","contentsource":"AX","returpath":null,"detailpath":"https://ax.sample.com/Rep>ositories/form.aspx?path=C25/96/99&mode=Read","detailspath2":"samplepath"

PHP 示例

包含这个 php 示例是为了简单地显示正则表达式的工作原理

<?php
$sourcestring="your source string";
echo preg_replace('/(\s)\s*|("url":"[^"]*")/im','\1',$sourcestring);
?>

$sourcestring after replacement:
"startofjunk junkjunkjunkjunk","url":"https://x.com/a/C25/XPS - Connection - May 2013.docx","contentsource":"AX","returpath":null,"detailpath":"https://ax.sample.com/Rep>ositories/form.aspx?path=C25/96/99&mode=Read","detailspath2":"samplepath"
于 2013-06-04T05:03:07.387 回答