php - 删除新闻文件中的重复数据

Question

我们有一些新闻发布的数据，格式如下。这里\t是一个实际的制表符。

Headline\tDate\tNews

问题是过去存在一些与这样的重复或额外字段有关的问题..

Government Shutdown Latest News {null}{10/15/2013}  {10/15/2013}    words words words.
Email Flow in Exchange  {null}{10/17/2013}  {10/17/2013}    words words words....
Should This be banned?  {null}{10/23/2013}  {10/23/2013}    words words words....

我需要删除第一个括号字段{null}和第三个重复字段以及第三个字段后面的制表符。

所以最初这个数据的每一行应该是这样的。

Government Shutdown Latest News    {10/15/2013}    words words words....
Email Flow in Exchange    {10/17/2013}    {10/17/2013}    words words words....
Should This be banned?    {10/23/2013}    {10/23/2013}    words words words....

我无法仅删除这两个字段和选项卡。它与它们都匹配。

preg_replace('/\{.*?\}(?={)|\{.*?\}\t/', '', $text);

score 3 · Accepted Answer

您可以对作业使用Negative Lookbehind。

(?<![^\s]){[^}]*}\t?

正则表达式：

(?<!           look behind to see if there is not:
 [^\s]         any character except: whitespace (\n, \r, \t, \f, and " ")
)              end of look-behind
{              '{'
 [^}]*         any character except: '}' (0 or more times)
}              '}'
\t?            '\t' (tab) (optional)

注意：您可以在不转义{ }此处的情况下执行此操作。

请参阅工作演示和regex101 演示

score 2 · Accepted Answer

你可以试试这个模式：

$result = preg_replace('~[^\s}]\s*\K{null}|{[0-9]{2}/[0-9]{2}/[0-9]{4}}\t(?!\s*[^{])~', '', $text);

php - 删除新闻文件中的重复数据

2 回答 2

Related

Reference