c# - 正则表达式删除动态字符串并保留其余部分

Question

我需要从中删除：http ://example.com/media/catalog/product/cache/1/thumbnail/56x/9df78eab33525d08d6e5fb8d27136e95/i/m/images_3.jpg product/ 和 /i 之间的所有内容并保留http ://example.com/media/catalog/product/i/m/images_3.jpg使用正则表达式或 c#。这些是爬虫应用程序中的选项。请帮忙。

score 1 · Accepted Answer

var input = "http://example.com/media/catalog/product/cache/1/thumbnail/56x/9df78eab33525d08d6e5fb8d27136e95/i/m/images_3.jpg";
var re = new Regex("^(.+/product)/.+(/i/.+)$");
var m = re.Match(input);
if (!m.Success) throw new Exception("does not match");
var result = m.Groups[1].Value + m.Groups[2].Value;
//result = "http://example.com/media/catalog/product/i/m/images_3.jpg"

score 0 · Accepted Answer

string str = "http://example.com/media/catalog/product/cache/1/thumbnail/56x/9df78eab33525d08d6e5fb8d27136e95/i/m/images_3.jpg";
int prodIndex = str.IndexOf("/product/");
int iIndex = str.IndexOf("/i/");
string newStr = str.Substring(0, prodIndex + "/product/".Length)
              + str.Substring(iIndex + 1);

这是一个使用正则表达式的更通用的示例，它只查找 32 个字符散列之后的部分，而不是假设它是/i/：

string str = "http://example.com/media/catalog/product/cache/1/thumbnail/56x/9df78eab33525d08d6e5fb8d27136e95/i/m/images_3.jpg";
var match = Regex.Match(str, @"(.*/product/).*/.{32}/(.*)");
var newStr = match.Groups[1].Value + match.Groups[2].Value;

c# - 正则表达式删除动态字符串并保留其余部分

2 回答 2

Related

Reference